mysociety / caps

A simple, open database of local government climate action plan documents and emissions data.
https://cape.mysociety.org
Other
9 stars 2 forks source link

Improved search and deep linking #129

Open ajparsons opened 3 years ago

ajparsons commented 3 years ago

Copying from a notes document:

One of the user interviews was disappointed that the search couldn’t link to individual pages, and this feels a fixable issue with a little R&D time. Similarly there are some existing tools for table of contents extraction that paired with deep linking would make PDFs much more useful without needing to fully understand the content. Structured understanding of tables of contents could improve search, and provide a light form of structured content for comparing documents.

Chrome supports both linking to a specific page of a PDF url, and also linking to specific text fragments on an html page. Could use this to validate if it's useful.

struan commented 2 years ago

For context, at the moment the search works by extracting the text from the PDF, putting that in to the CSV as plain text and then indexing that text, so we're not directly indexing the PDF.