itst / pdf-highlights

Export your PDF highlights to markdown files.
Other
23 stars 3 forks source link

KeyError: 'Title' #3

Closed ncarboni closed 4 years ago

ncarboni commented 5 years ago

Just tested the python script but got this error:

pdf-highlights.py /Users/Nicola/Downloads/Improving\ OWL\ RL\ reasoning\ in\ N3\ by\ using\ specialized\ rules.pdf > output.txt
Document doesn't include outlines ("bookmarks")

Author is not set.Traceback (most recent call last):
  File "/Users/Nicola/miniconda3/bin/pdf-highlights.py", line 4, in <module>
    __import__('pkg_resources').run_script('pdf-highlights==0.1.0', 'pdf-highlights.py')
  File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 661, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1441, in run_script
    exec(code, namespace, namespace)
  File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pdf_highlights-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf-highlights.py", line 436, in <module>
    main()
  File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pdf_highlights-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf-highlights.py", line 433, in main
    print_annots(fh)
  File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pdf_highlights-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf-highlights.py", line 417, in print_annots
    pretty_print(allannots, outlines, mediaboxes, doc.info[0])
  File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pdf_highlights-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf-highlights.py", line 278, in pretty_print
    _title = resolve1(info["Title"])
KeyError: 'Title'

any idea what could be wrong?

itst commented 5 years ago

Oh boy. The try-catch used to set the document’s title is off. I will correct it asap. If you know your way around Python, line 278 needs to be removed/commented out. That should do the trick.

pdf-highlights uses the author and title info as given in the PDF’s attributes and includes them in output.txt.

-- Sascha A. Carlin - Let's do this! sc@itst.net | +49-177-3074952 | itst.net

On 23. Dec 2018, at 01:57, Nicola Carboni notifications@github.com wrote:

Just tested the python script but got this error:

pdf-highlights.py /Users/Nicola/Downloads/Improving\ OWL\ RL\ reasoning\ in\ N3\ by\ using\ specialized\ rules.pdf > output.txt Document doesn't include outlines ("bookmarks")

Author is not set.Traceback (most recent call last): File "/Users/Nicola/miniconda3/bin/pdf-highlights.py", line 4, in import('pkg_resources').run_script('pdf-highlights==0.1.0', 'pdf-highlights.py') File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 661, in run_script self.require(requires)[0].run_script(script_name, ns) File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 1441, in run_script exec(code, namespace, namespace) File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pdf_highlights-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf-highlights.py", line 436, in main() File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pdf_highlights-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf-highlights.py", line 433, in main print_annots(fh) File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pdf_highlights-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf-highlights.py", line 417, in print_annots pretty_print(allannots, outlines, mediaboxes, doc.info[0]) File "/Users/Nicola/miniconda3/lib/python3.7/site-packages/pdf_highlights-0.1.0-py3.7.egg/EGG-INFO/scripts/pdf-highlights.py", line 278, in pretty_print _title = resolve1(info["Title"]) KeyError: 'Title'

any idea what could be wrong?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ncarboni commented 5 years ago

Thanks for the quick answer! I am currently without computer, but I will try as soon as I get it back :)

ncarboni commented 5 years ago

just tried and now it does execute without errors. It does work with PDF highlighted in Devonthink, where I get:

# [](x-devonthink-item://) [🕵️‍♀️](x-devonthink://search?query=)
[](x-devonthink://search?query=)

# Notes

@: Typically crowdsourcing-based approaches to gather annotated data use inter-annotator agreement as a measure of quality. However, in many domains, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. In this paper, we present ongoing work into the CrowdTruth m  

1: The CrowdTruth metrics model the inter-dependency between the three main components of a crowdsourcing system – worker, input data, and annotation. The goal of the metrics is to capture the degree of ambiguity in each of these three components. The metrics are available online at https://github.com/ CrowdTruth/CrowdTruth-core

@: Aroyo and Welty 2015] Aroyo, L., and Welty, C. 2015. Truth Is a Lie: CrowdTruth and the Seven Myths of Human Annotation. AI Magazine 36(1):15–24.  

@: [Dumitrache, Aroyo, and Welty 2018] Dumitrache, A.; Aroyo, L.; and Welty, C. 2018. Capturing ambiguity in crowdsourcing frame disambiguation  

while, if I highlight something on preview I don't get anything out of it

# [TITLE](x-devonthink-item://) [🕵️‍♀️](x-devonthink://search?query=TITLE)
[AUTHOR](x-devonthink://search?query=AUTHOR)

# Notes

1: None

1: None

Questions:

itst commented 5 years ago

Does the PDF has the Title and Author attributes set? That is the info used in printing the first part of the output.

You could either edit the PDF's attributes or add the author's name and title manually to the output. Editing PDF metadata such as these attributes requires Adobe Acrobat, Skim or any other sophisticated PDF tool.

As for the empty output after highlighting something in Preview, that might depend on the color or annotation type. Did you highlight the same text with the same tool in both DT and Preview?

In any case, if you can provide me with the PDF, I can take a look.

ncarboni commented 5 years ago

Ok, the PDF I tried is this one (with the highlighting already in): CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement (0004) 2.pdf. Now that I see the two different highlight colours side-by-side I wonder if in Mojave or somewhere around high sierra they maybe change the default highlighting colour in preview. I remember it more bright...

Does the PDF has the Title and Author attributes set? That is the info used in printing the first part of the output.

No it didn't. Good point. I was secretly hoping it would take automatically the DT link (e.g. x-devonthink-item://8EB5A5EE-F929-4B10-B7A4-AA27F5A22C33) to the PDF there! But still pretty good! :)

You could either edit the PDF's attributes or add the author's name and title manually to the output. Editing PDF metadata such as these attributes requires Adobe Acrobat, Skim or any other sophisticated PDF tool.

I see I can also add them with DT, but I will try to see if there is a way to automatically embed them when adding the file with a reference manager. Do you do it automatically for each PDF?

itst commented 4 years ago

Unknown state :-(