urschrei / pyzotero

Pyzotero: a Python client for the Zotero API
https://pyzotero.readthedocs.org
Other
908 stars 99 forks source link

.txt files in attachment are dumped in an unreadable format #166

Closed beastraban closed 1 year ago

beastraban commented 1 year ago

I have an attachment type item which is a .txt file (in hebrew) Viewing it in Zotero works fine.

However, when either reading from the file using: with open(zot.file(key)) as f: text=f.read()

or alternatively dumping it into a local directory with: zot.dump(key)

with the first case I get the following: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xab in position 10: invalid start byte

When opening the .txt file in notebook - the text is obviously garbled (it LOOKS like utf-8 text) and when using Mozilla firefox to open the .txt file and switch encoding - it can't.

pyzotero version 1.5.16

urschrei commented 1 year ago

Can you put a copy of the file somewhere so I can try to reproduce this?

beastraban commented 1 year ago

https://www.dropbox.com/scl/fi/2xb7wlf0lekekt8i784ug/1.txt?rlkey=0o5stj7n218g9m6ow32k0ymob&dl=0 That's the original file

https://www.dropbox.com/scl/fi/oob4lkgivs5vsi228kq2j/.txt?rlkey=257njnsf3f948wozy1dzk17xv&dl=0 That's the resulting file after zot.dump()

https://www.dropbox.com/scl/fi/2xb7wlf0lekekt8i784ug/1.txt?rlkey=0o5stj7n218g9m6ow32k0ymob&dl=0 That's the zotero item itself (exported)

Thanks :)

urschrei commented 1 year ago

Fixed in v1.5.17, on PyPI.

beastraban commented 1 year ago

Thanks. However. for some reason now the "dump" and "file" calls can find files. They yield the following output:

>>zot.dump('SM7TPZLG')

Traceback (most recent call last):

  File ~\anaconda3\lib\site-packages\pyzotero\zotero.py:433 in _retrieve_data
    self.request.raise_for_status()

  File ~\anaconda3\lib\site-packages\requests\models.py:1021 in raise_for_status
    raise HTTPError(http_error_msg, response=self)

HTTPError: 404 Client Error: Not Found for url: https://api.zotero.org/%5Cusers%5C2999351%5Citems%5CSM7TPZLG

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  Cell In[12], line 1
    zot.dump('SM7TPZLG')

  File ~\anaconda3\lib\site-packages\pyzotero\zotero.py:741 in dump
    file = self.file(itemkey)

  File ~\anaconda3\lib\site-packages\pyzotero\zotero.py:237 in wrapped_f
    item = self._retrieve_data(fixed_path)

  File ~\anaconda3\lib\site-packages\pyzotero\zotero.py:435 in _retrieve_data
    error_handler(self, self.request, exc)

  File ~\anaconda3\lib\site-packages\pyzotero\zotero.py:1667 in error_handler
    raise error_codes.get(req.status_code)(err_msg(req)) from exc

ResourceNotFound: 
Code: 404
URL: https://api.zotero.org/%5Cusers%5C2999351%5Citems%5CSM7TPZLG
Method: GET
Response: <h1>Not Found</h1>
<p>The page you requested could not be found.</p>

Whereas in the previous version 1.5.16 it does find them. the 'item' call works fine.

Thanks