mgmeyers / obsidian-zotero-integration

Insert and import citations, bibliographies, notes, and PDF annotations from Zotero into Obsidian.
GNU General Public License v3.0
1.08k stars 55 forks source link

Zotero not updating annotations? #394

Open Jmuccigr opened 1 month ago

Jmuccigr commented 1 month ago

I'm having an issue with the count of annotations in a PDF attachment which I've been editing a bit, though not since doing the import/getting data.

The PDF very definitely has 47 annotations (highlights). That's what Preview (mac user) shows. That's what Zotero reports in the pane for the item.

The problem is that the Data previewers is showing twice that number, 94. And 94 get imported. I've had problems before with Zotero being slow to update annotations, but it's been a few hours and I've restarted Zotero a few times. I'm not quite sure where the problem is, though I suspect it's Zotero. Since I'm not sure exactly what the plugin is doing to query Zotero, I didn't want to go there first and ask.

So a few things, I guess:

  1. Is this a known issue?
  2. What exactly is the method for getting that data back from Zotero?

Thanks.

Jmuccigr commented 1 month ago

I'm now finding this more and more.

Here's the workflow that gets me there:

  1. Add a PDF attachment in Zotero.
  2. Open it in Preview (macOS) and put in highlights. At this point Zotero doesn't show any annotations.
  3. Open it in Zotero, then close it without doing anything. This gets Zotero to load the annotations.
  4. Do the import via the plugin and get twice the correct number of annotations and the whole set imported twice. One set of annotations have an old date attached to them (from when they were added to the file, as also reported by Preview) and another set with the import date on them.

I'm not sure what the plugin is doing to get the count, so I can't replicate it, but no place else seems to show the incorrect number.

Some scenarios as reported by the plugin:

  1. The replacement PDF is read by Zotero: double the annotations where half are old and the other (first) half have the import date on them. The attachment info pane in Zotero shows the annotations.
  2. The replacement PDF is attached but isn't read by Zotero, so the attachment info pane in Zotero does not show the annotations: the right number of annotations, all of which are old (as dated in the file).
  3. The replacement PDF is deleted: no annotations at all.
  4. The replacement PDF is stripped of highlights and added: no annotations.

I should add that I've also deleted and re-added the file, so it's in a different folder.

My guess is that the plugin is somehow reading both the annotations from the PDF itself and the annotations that Zotero is recording, and so getting to twice the number. I'm not sure which set it should read, but probably only those from Zotero.


My template has this for the duplicated part:

{% if annotations.length > 0 %}  

## Annotations from Zotero  

{% for a in annotations %}
{%- if a.type == "highlight" -%}
> <mark style="background-color: {{a.color}}">Quote</mark>
> {{ a.annotatedText }}
> ([p. {{a.pageLabel}}](zotero://open-pdf/library/items/{{a.attachment.itemKey}}?page={{a.page}}&annotation={{a.id}}))
{%- if a.comment %}
> > Attached Note: 
> > <mark style="background-color: {{a.color}}">{{a.comment}}</mark>
{% endif -%}<br>
{% elif a.type == "text" %}
> <mark style="background-color: {{a.color}}">{{a.comment}}</mark>
> ([p. {{a.pageLabel}}](zotero://open-pdf/library/items/{{a.attachment.itemKey}}?page={{a.page}}&annotation={{a.id}}))
{% endif %}
{% endfor %}
{% endif %}
Jmuccigr commented 1 month ago

Aha! Here's a difference that's actionable: the "source" key in each annotation is either "pdf" (on the annotations with the old date) or "zotero" (on the annotations with the import date). In my (limited) experience Zotero does a better job at reading the text with OCRed files, so I'll grab just those when both are present. Here's the template logic:

{%- set l = annotations.length -%}
{%- if l > 0 %}  
{% set src = "zotero" %}
{%- if annotations[0].source == annotations[l - 1].source %}
{% set src = annotations[0].source %}
{% endif -%}

## Annotations from Zotero  

{% for a in annotations %}
{%- if a.source == src -%}
{%- if a.type == "highlight" or a.type == "underline" -%}
> <mark style="background-color: {{a.color}}">Quote</mark>
> {{ a.annotatedText }}
> ([p. {{a.pageLabel}}](zotero://open-pdf/library/items/{{a.attachment.itemKey}}?page={{a.page}}&annotation={{a.id}}))
{%- if a.comment %}
> > Attached Note: 
> > <mark style="background-color: {{a.color}}">{{a.comment}}</mark>
{% endif -%}<br>
{% elif a.type == "text" or a.type == "note" -%}
> <mark style="background-color: {{a.color}}">{{a.comment}}</mark>
> ([p. {{a.pageLabel}}](zotero://open-pdf/library/items/{{a.attachment.itemKey}}?page={{a.page}}&annotation={{a.id}}))
{% else %}
> <mark style="background-color: red">Uh-oh, an unknown note type was here!<br>{{a.comment}}</mark>
> ([p. {{a.pageLabel}}](zotero://open-pdf/library/items/{{a.attachment.itemKey}}?page={{a.page}}&annotation={{a.id}}))
{% endif %}
{% endif -%}
{% endfor -%}
{% endif -%}

First I set the src variable to "zotero" and then if it turns out that there aren't two sources, I use the one that's there, grabbing it from the first annotation. I haven't seen a case where there are more than two sources of annotations (i.e., pdf and zotero), so hopefully this won't go off the rails! If Zotero hasn't opened the attachment since changes were made to it, of course it won't know about those changes.

Side note: in the pdf edited with Preview, notes are given a type of "note" whereas in the zotero annotations, they have a type of "text". Zotero-generated notes (which I generally don't have because I don't use Zotero to mark up my pdfs) have a type of "note". The template treats them identically, as it does highlighting and underlining (which I don't use). Currently any other type of annotation gets flagged as present.

boothemjr commented 1 month ago

YES YES YES! This has been driving me nuts for MONTHS. Using the data explorer, I found the exact same issue where the annotations are being listed twice, once with source: Zotero and once with source: PDF. image

boothemjr commented 1 month ago

It's curious how there are subtle changes between the two entries, including the coordinates and the hexcode for the color. I wonder which one is more "correct".

Jmuccigr commented 1 month ago

Further to this, I find that occasionally Zotero does not do a good job reading the highlighted text. Not sure why, but it happens. For that eventuality I can always pull the annotations from the pdf, but this, I think, requires a different template. It's a minor change to the template (just replacing zotero with pdf in one line), but still different.

I can always just go into the settings and change the template, but that' a hassle. What's the best way to simplify this process? Can I macro this? Should this be some kind of feature request? Could it be easier to pick a template in the settings?

Thanks.