One of the user interviews was disappointed that the search couldn’t link to individual pages, and this feels a fixable issue with a little R&D time. Similarly there are some existing tools for table of contents extraction that paired with deep linking would make PDFs much more useful without needing to fully understand the content. Structured understanding of tables of contents could improve search, and provide a light form of structured content for comparing documents.
Chrome supports both linking to a specific page of a PDF url, and also linking to specific text fragments on an html page. Could use this to validate if it's useful.
For context, at the moment the search works by extracting the text from the PDF, putting that in to the CSV as plain text and then indexing that text, so we're not directly indexing the PDF.
Copying from a notes document:
Chrome supports both linking to a specific page of a PDF url, and also linking to specific text fragments on an html page. Could use this to validate if it's useful.