ukwa / w3act

w3act is an annotation and curation tool for building web archive collections
Apache License 2.0
19 stars 6 forks source link

Improve metadata extraction, especially for PDFs #392

Open anjackson opened 9 years ago

anjackson commented 9 years ago

We need to check we're doing a good enough job with what we have, and we should look at exploiting additional tools in order to improve metadata extraction from PDFs.

anjackson commented 9 years ago

More on GROBiD