Closed brimwats1 closed 2 years ago
Hi @brimwats, could you please (1) send me the PDFs either here or by email (listed on my GitHub profile), and (2) also include what I suspect is a missing line from the error's trace, namely the error itself?
re 1) sure, sending in a moment re 2) I don't get any error beyond the trace, I used the online version you've linked to https://huggingface.co/spaces/paulbricman/decontextualizer
Got the email, thanks for the prompt reply! I'll try processing the PDFs a few days into January. Happy holidays till then!
Hi, @brimwats! So I pushed a version which handles the extraction part a bit better than before. However, from what I've seen the model has a hard time working with 3+ sentence highlights, and works best with 1-2 sentence highlights. I'm afraid the current version of the tool won't be of much use in your situation :disappointed:
Hello!
I attempted two different PDFs (which I can share in a DM or a email) — one was an older-pre computer PDF that had been OCRed professionally and hilighted. Another was a modern PDF, of a book published last year, also highlighted. Zotero was able to extract from both using pdfjs. When i use https://huggingface.co/spaces/paulbricman/decontextualizer i get: