urschrei / pyzotero

Pyzotero: a Python client for the Zotero API
https://pyzotero.readthedocs.org
Other
908 stars 99 forks source link

Improve documentation for last_modified_version() #120

Closed michi-zuri closed 4 years ago

michi-zuri commented 4 years ago

Currently, the documentation for Zotero.last_modified_version() states:

    Returns the last modified version of the library

While this description of the function seems to be sometimes true (if called after a request like Zotero.items() or Zotero.count_items()), it does not seem true if called after requests to group metadata Zotero.groups(), at least in my testing. The latter returned the last modified version of the group's metadata and not of the library data...

After digging into the code I am confused about my empirical results, since the two-step method seems to first make a call to self.items() and only then returns the headers of the last response before calling this method:

    def last_modified_version(self, **kwargs):
        """ Get the last modified version
        """
        self.items(**kwargs)
        return int(self.request.headers.get("last-modified-version", 0))

Therefore I now think this is a bug, instead of faulty documentation. The documentation describes what the method should be doing if you look at the code, but could it be that the function returns a value before the request has received a response from the Zotero server for the self.items(**kwargs) call? Maybe this behaviour has changed since the introduction of this method? I can't tell.

For what it matters, I tested this with Python v3.8.5 and pip freeze tells me that I am using Pyzotero==1.4.16 and requests==2.24.0.

Now that I tried a different approach and just inspect the Zotero.request.headers directly without calling a separate method, I could shave off 4.5 seconds from the sync duration, hooray:

remote_count = Zotero.num_items()
last_modified_version = Zotero.last_modified_version()
# took 5.370414 seconds
remote_count = Zotero.num_items()
last_modified_version = Zotero.last_modified_version(limit=1, format='keys')
# took 1.204483 seconds
remote_count = Zotero.num_items()
last_modified_version = int(Zotero.request.headers.get('last-modified-version', 0))
# took 0.697992 seconds

If this is a bug that will eventually be fixed, there is a need for an alternative way to access response headers of the last made request. Without direct access to the Last-Modified-Version header it is not possible to implement Partial Sync efficiently. However, as it seems possible to directly access the latest response headers in Zotero.request.headers, I am hesitant wether a method would be the right way to expose them, like so:

    def response_headers(self):
        """ Get the response headers of the last made request
        """
        return self.request.headers

And I'm not sure how to fix the behaviour of last_modified_version(), assuming it is indeed a bug... Hope this helps.

michi-zuri commented 4 years ago

Regarding partial sync, some facts to consider:

For further reading on this topic I suggest: https://www.zotero.org/support/dev/web_api/v3/syncing#partial-library_syncing

urschrei commented 4 years ago

There are two separate issues here, as you say:

  1. I hadn't considered the possibility that when calling last_modified_version, there could already be a potentially stored version header due to a previous API call. I also didn't think about making a cheaper API call than items() in order to retrieve headers if no existing headers were available. This will be easy to fix, I think.

  2. I need to figure out whether Library l.m.v is what is always wanted in this case.

urschrei commented 4 years ago

OK, so thinking about this a little more, and after reading the documentation:

You can't rely on inspecting self.request.headers in order to get the library version, because only methods (i.e. items(), top()) that return multiple objects return the latest library version (see version numbers). So it's necessary to retain an API call in the latest_library_version() method. However, I've now changed the call to have a limit of 1, so the method should be much faster (in the 100s of milliseconds, depending on the network).

I can't recreate a situation in which multiple calls to latest_library_version() return different values when using the user library type, without making a write API call first. If using a groups library type, multiple calls can of course return different values, because of concurrent access by multiple users.

michi-zuri commented 4 years ago

I can't recreate a situation in which multiple calls to latest_library_version() return different values when using the user library type, without making a write API call first.

Hmm, I can't recreate that situation either now, I must have mixed up something in my mind.

only methods (i.e. items(), top()) that return multiple objects return the latest library version

Yeah, that seems to be correct!

I only just now realised, that the Zotero API only memorizes one newest library version which is called Last-Modified-Version. What I wrote before about partial sync not being implementable without knowing the oldest library version that is identical with the newest for a particular subset: I was wrong. I think this issue can be closed.