mwegnr commented 2 months ago

Platform: Arch Linux (also happens in dockered Debian) Python version: 3.12.5 Pyzotero version: 1.5.20

Problem Description

What were you trying to do? For our project, we refresh a json containing our zotero library every night for different locales (en-US, de-DE). The resulting files seem to be identical. However, if fetched directly from the API using request, they are different.
What API call did it involve? top items (see code example below)
What error was raised? no direct error

More Details

Our group library is public, using the following code it should be possible to reproduce the error.

Minimal code example:

```python import json import requests from pyzotero import zotero def fetch_zotero_entries(locale: str) -> list: # initialize Text+ library object tplus_zotero_library = zotero.Zotero(library_id='4533881', library_type='group', locale=locale) tplus_zotero_library.add_parameters(format='json', include='bibtex,bib,csljson,data', linkwrap='1') tplus_entries = tplus_zotero_library.top() return tplus_entries def fetch_zotero_entries_requests(offset: int = 0, locale: str = "de-DE"): URL = "https://api.zotero.org/groups/4533881/items/top" url_with_params = URL + f"?start={offset}&limit=100&format=json&include=bibtex,bib,csljson,data&linkwrap=1&locale={locale}" zotero_response = requests.get(url_with_params) items = zotero_response.json() return items def write_zotero_json(data, suffix: str, locale: str): path = f"zotero-unprocessed-min.{suffix}.{locale}.json" with open(path, 'w') as output_file: json.dump(data, output_file, indent=2) def refresh_json(use_lib: True, locale: str): if use_lib: zotero_items = fetch_zotero_entries(locale=locale) else: zotero_items = fetch_zotero_entries_requests(locale=locale) if zotero_items is not None: suffix = "pyzotero" if use_lib else "requests" print(f"Successfully fetched entries for {locale} with {suffix}") write_zotero_json(zotero_items, suffix=suffix, locale=locale) refresh_json(use_lib=True, locale="de-DE") refresh_json(use_lib=True, locale="en-US") refresh_json(use_lib=False, locale="de-DE") refresh_json(use_lib=False, locale="en-US") ```

Hashes of obtained files:

sha256sum zotero-unprocessed-min*

6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.pyzotero.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.pyzotero.en-US.json
f0ff0dc6df7906bb3bfcd32606fd6b2b0f7ebce18d1f3626109e146676ac286f  zotero-unprocessed-min.requests.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.requests.en-US.json

urschrei commented 2 months ago

Could you try again using v1.5.24?

mwegnr commented 2 months ago

Works as expected with v1.5.24

sha256sum zotero-unprocessed-min*
f0ff0dc6df7906bb3bfcd32606fd6b2b0f7ebce18d1f3626109e146676ac286f  zotero-unprocessed-min.pyzotero.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.pyzotero.en-US.json
f0ff0dc6df7906bb3bfcd32606fd6b2b0f7ebce18d1f3626109e146676ac286f  zotero-unprocessed-min.requests.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.requests.en-US.json

Thank you for the really quick fix!

mwegnr commented 2 months ago

I encountered an error with the added locale when combining the top() with everything(). I was using v1.5.25.

It seems, that at the second top() call in everything(), the locale is added again, which leads to this invalid request URL after the first 100 items have been obtained: URL: https://api.zotero.org/groups/4533881/items/top?include=bib%2Cbibtex%2Ccsljson%2Cdata&limit=100&linkwrap=1&locale=de-DE&start=100&locale=de-DE

Code reproducing this error:

```python from pyzotero import zotero, zotero_errors def fetch_zotero_entries(locale: str) -> list: # initialize Text+ library object tplus_zotero_library = zotero.Zotero(library_id='4533881', library_type='group', locale=locale) try: tplus_zotero_library.add_parameters(format='json', include='bibtex,bib,csljson,data', linkwrap='1') tplus_entries = tplus_zotero_library.everything(tplus_zotero_library.top()) return tplus_entries except zotero_errors.HTTPError as error: print(error) fetch_zotero_entries(locale="de-DE") ```

urschrei commented 2 months ago

Can you install master and try now? The new solution is a bit more robust about adding the locale if it already exists but I want to make sure it works before I push a new release.

mwegnr commented 2 months ago

I do not get an error anymore using top() in everything(), but the locale seems to get ignored again (en-US and de-DE-JSON have same hash) using the code from the original report.

I also noticed that the JSON using pyzotero is missing the bib,bibtex and csljson fields, which were added as a parameter. Therefore the hash is different from the JSON file generated using requests.

sha256sum zotero-unprocessed*

cf775c6bf78158d780d91665e9cc55a1089795f61ba1ba0dfc67f566380b15db  zotero-unprocessed-min.pyzotero.de-DE.json
cf775c6bf78158d780d91665e9cc55a1089795f61ba1ba0dfc67f566380b15db  zotero-unprocessed-min.pyzotero.en-US.json
f0ff0dc6df7906bb3bfcd32606fd6b2b0f7ebce18d1f3626109e146676ac286f  zotero-unprocessed-min.requests.de-DE.json
6df7f88aa966ad47c1cb43e33d87d60f0bdf5ac0f6ead382141cc558513a84af  zotero-unprocessed-min.requests.en-US.json

pip list
Package            Version
------------------ --------------------
bibtexparser       1.4.1
certifi            2024.8.30
charset-normalizer 3.3.2
feedparser         6.0.11
idna               3.10
pip                24.2
pyparsing          3.1.4
pytz               2024.2
pyzotero           1.5.26.dev4+g12896b5
requests           2.32.3
sgmllib3k          1.0.0
urllib3            2.2.3

Also, this is my complete code generating the complete JSON files. Feel free to use it for testing

Complete Code

```python # This script collects entries from Zotero using pyzotero and stores them to a local JSON without any further processing # This is needed, since the response from the API slows down every Hugo build # Should run every night scheduled by the GitLab CI to keep the JSON updated import json import sys import time from pyzotero import zotero, zotero_errors def fetch_zotero_entries(locale: str) -> list: # init some variables retry_request_max = 3 # initialize Text+ library object tplus_zotero_library = zotero.Zotero(library_id='4533881', library_type='group', locale=locale) for i in range(retry_request_max): # this loop is needed, because the zotero library is big and API timeouts occur often try: # add required formats to request tplus_zotero_library.add_parameters(format='json', include='bibtex,bib,csljson,data', linkwrap='1') # request top level items and wrap them in zotero.everything # a single top() request would only allow up to 100 items per request tplus_entries = tplus_zotero_library.everything(tplus_zotero_library.top()) return tplus_entries except zotero_errors.HTTPError: # wait for 180 seconds, since the Zotero API somtimes needs time to generate the answer print("Zotero API timeout. Trying again in 180 seconds") time.sleep(180) pass raise TimeoutError(f"Zotero API did not respond in time after {retry_request_max} retries") def write_zotero_json(data, locale: str): path = f"zotero-unprocessed.{locale}.json" with open(path, 'w') as output_file: json.dump(data, output_file, indent=2) def refresh_json(locale: str): zotero_items = fetch_zotero_entries(locale=locale) if zotero_items is not None: print(f"Successfully fetched entries for {locale}") write_zotero_json(zotero_items, locale) else: sys.exit(1) refresh_json("de-DE") refresh_json("en-US") ```

urschrei / pyzotero

Locale seems to get ignored #183

Problem Description

More Details