urschrei / pyzotero

Pyzotero: a Python client for the Zotero API
https://pyzotero.readthedocs.org
Other
908 stars 99 forks source link

pyzotero.zotero_errors.HTTPError: Code: 500 #134

Open fabbra opened 3 years ago

fabbra commented 3 years ago

I have a script running every night which automatically creates various bibliographies from my Zotero library. To do so I get the HTML citation for every library item individually (more than 200 items) using the example code below (where i corresponds to a zotero item and the second line is executed multiple times, i.e. for each of the >200 items individually):

zot = zotero.Zotero(library_id, library_type, api_key)
html = zot.item(i['key'], content='bib', style='ieee')

This often works fine. However, sometimes it fails (without me or anyone else having changed anything at the library or the code) with an HTTP Error 500 and the following traceback:

  File "/builds/div-e/tools/zotero_service/src/zotero_service/bibliography_converter.py", line 134, in _add_item_to_biblio
    html = self._zot.item(i['key'], content='bib', style='ieee')[0]
  File "/usr/local/lib/python3.7/site-packages/pyzotero/zotero.py", line 204, in wrapped_f
    retrieved = self._retrieve_data(func(self, *args))
  File "/usr/local/lib/python3.7/site-packages/pyzotero/zotero.py", line 439, in _retrieve_data
    error_handler(self, self.request)
  File "/usr/local/lib/python3.7/site-packages/pyzotero/zotero.py", line 1653, in error_handler
    raise ze.HTTPError(err_msg(req))
pyzotero.zotero_errors.HTTPError: 
Code: 500
URL: https://api.zotero.org/groups/2334655/items/NFRNSS9H?content=bib&style=ieee&format=atom&limit=100
Method: GET
Response: An error occurred

If I then restart the script (without changing anything in the code or in the library) it often works.

Therefore I suspect that this is an issue either with timings or the Zotero server blocking too many requests. Could this be possible? Is there an option of setting a number of retries or something to avoid this problem?

Besides, it would probably make sense to query the citation of all library items at once instead of querying item by item, right?

urschrei commented 3 years ago

Unfortunately the 500 error isn't particularly informative, and you'll receive a specific error if you're being rate-limited, but there's no reason for you to call zot.item individually. If you want to retrieve every item in your library, you can use zot.everything(zot.item(content='bib', style='ieee')) which should be a lot faster and less error prone, and far less likely to get you rate-limited.

fabbra commented 3 years ago

I guess you meant zot.everything(zot.items(content='bib', style='ieee')) (with itemS) at least that is the only working thing for me. However, in my scenario I do not have all library items but a subset of them. Is there a way to do this request for a specific set of library items (e.g., using their keys as index in the request). Or is there at least a way of associating the output of the aforementioned command (returning the citation in HTML format) to the keys of different library items? One could use the itemKey parameter to pass a list of key for the items of interest but this is limited to 50 items, right?

urschrei commented 3 years ago

As you say, the itemKey parameter is the only way to do this with arbitrary keys. You can either put them into a group, or manually split your list into sub-lists of 50 (with the last one potentially being shorter) and pass each one to your items call. That would still be more efficient than what you're doing now.

fabbra commented 3 years ago

Actually this seems to work fine even for more than 50 items (tested with 320 items): zot.everything(zot.items(itemKey=','.join(keys), content='bib', style='ieee')) Where keys is a list with the keys of all items of interest.

fabbra commented 3 years ago

Thanks!

fabbra commented 3 years ago

Here I am stuck with another problem.

Even though the command html = zot.everything(zot.items(itemKey=','.join(keys), content='bib', style='ieee')) works for >50 items the order in which they are returned is not the same as they are requested. In other words the items in html are ordered differently than in keys so it is impossible to link the generated html to specific keys if I only want to extract a subset of items.

If we could sort the result by itemKey the problem would be resolved but I couldn't find this in the API documentation.

Any other idea how this could be fixed?

Sorting them by dateAdded might result in issues when two items were added exactly at the same moment...

beastraban commented 1 year ago

Hi, I am getting the same 500 error. My use case is the following: I am making some items with: item=zot.item_template('videoRecording')

I then fill in the fields with info, and finally check the item and try to upload it: zot.check_items([item]) zot.create_items([item],parentID=ID)

However, I get the following error:

pyzotero.zotero_errors.HTTPError: 
Code: 500
URL: https://api.zotero.org/users/2999351/items/4VMRHBV2
Method: PATCH
Response: An error occurred

I believe the key at the end is created automatically. Anyway - any insight would be much appreciated.

urschrei commented 1 year ago

It's hard to know what's going on here. A 500 error is (of course) a server-side application error, so unless Pyzotero is sending malformed input (which seems unlikely, but not impossible) there isn't much I can do. Are you creating large numbers of items in sequence? Can you reproduce the error by trying to recreate the item in question?

beastraban commented 1 year ago

Yes, so first I tried it with ~200 items, got the ~50 limit prompt. I tried with 25 items, the limit prompt went away. but I got the above 500 error. Now I am trying it with a single item (i.e. list with a single zotero item) - again: I get the 500 error...

import pandas as pd
from pyzotero import zotero

zot = zotero.Zotero(USER_ID, 'user', key)
items = zot.top(limit=5)

collectionShlishiKey='MNV4UAR3'
collectionMotzashKey='32INT4JY'

df = pd.read_csv(filename ) # can also index sheet by name or fetch all sheets
df.insert(1, "DATE",0)
for index, row in df.iterrows():
    DATE=extractDate(row['link'])
    df['DATE'][index]=DATE

TITLES=[]
for index,row in df.iterrows():
    TYPE=zot.item_template('videoRecording')
    TYPE['title']=row['name']
    TYPE['url']=Reisha+row['link']+'v'
    TYPE['extra']= row['rawDate']
    TYPE['runningTime']= row['dur']
    TYPE['language']= 'Hebrew'
    TYPE['place']= 'ירושלים'

    try:
        year=5760+int(row['DATE'][0])
        month=int(row['DATE'][1:3])
        day=int(row['DATE'][3:])

        hebdate=dates.HebrewDate(year,month,day)
        TYPE['date']=str(hebdate.to_greg())

    except Exception as e:
        print('error')
        TYPE['date']='error'
    TITLES.append(TYPE)

while TITLES:
    tit=[TITLES.pop()]
    zot.check_items(tit)
    zot.create_items(tit,parentid=collectionMotzashKey)

Where I replaced the user number with the string USER_ID and the key with the string 'key' in the Zotero object instantiation for privacy reasons.

Output:

HTTPError: 
Code: 500
URL: https://api.zotero.org/users/USER_ID/items/XNCRQEAA
Method: PATCH
Response: An error occurred

Maybe I am doing something wrong?

urschrei commented 1 year ago

Can you paste the output of TITLES.pop() that triggers the error?

beastraban commented 1 year ago

Sure:

{'itemType': 'videoRecording',
  'title': 'הלכות הימים הנוראים ותפילותיהן',
  'creators': [{'creatorType': 'director', 'firstName': '', 'lastName': ''}],
  'abstractNote': '',
  'videoRecordingFormat': '',
  'seriesTitle': '',
  'volume': '',
  'numberOfVolumes': '',
  'place': 'ירושלים',
  'studio': '',
  'date': '2004-03-12',
  'runningTime': 'כ-75 דקות',
  'language': 'Hebrew',
  'ISBN': '',
  'shortTitle': '',
  'url': 'http://maran1.com/winmedia/41219YsuccaW.wmv',
  'accessDate': '',
  'archive': '',
  'archiveLocation': '',
  'libraryCatalog': '',
  'callNumber': '',
  'rights': '',
  'extra': 'ליל ה אלול תשס"ד ',
  'tags': [],
  'collections': [],
  'relations': {}}

When I run the check_items() method with [TITLES.pop()] there is no error, so according to the docs it should be ok right?

urschrei commented 1 year ago

Check only ensures that there are no missing or extra fields, it doesn't check content. I'll see whether I can reproduce the error here.

urschrei commented 1 year ago

The error is occurring because you're using the create_items call wrong: the parentid parameter is intended to be used for adding an item to an existing item using its id, but you're passing it a collection id. Instead, add the collection id directly to the item when you create it:

add a line to your for index,row in df.iterrows(): loop as follows:

TYPE['collections']= collectionMotzashkey

and modify zot.create_items(tit,parentid=collectionMotzashKey) to omit the parentid parameter.

beastraban commented 1 year ago

Thanks!

It works. Still need to work out the kinks with regards to creating in the correct directory/subdirectory but at least its uploaded correctly :)