tteck / Proxmox

Proxmox VE Helper-Scripts
https://Helper-Scripts.com
MIT License
14.43k stars 2.37k forks source link

Paperless-ngx update to 2.11.6 - NLTK Redownload #3643

Closed Kh3nsu closed 2 months ago

Kh3nsu commented 2 months ago

Please verify that you have read and understood the guidelines.

yes

A clear and concise description of the issue.

Hello tteck,

I just updated Paperless-ngx to 2.11.6. The latest update mentions to redownload all the NLTK data, see release: https://github.com/paperless-ngx/paperless-ngx/releases/tag/v2.11.6

Now I receive errors while uploading any file. See error down below. I fixed it by redownloading NLTK, like mentioned in the release notes like that:

python3 -m nltk.downloader -d /usr/share/nltk_data all

Not sure if you would like to include this in your update script. You may close this issue if you think its not relevant.

What settings are you currently utilizing?

Advanced Settings

Which Linux distribution are you employing?

Debian 12

If relevant, including screenshots or a code block can be helpful in clarifying the issue.


Scan2024-09-02_164630.pdf: The following error occurred while storing document Scan2024-09-02_164630.pdf after parsing: 
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/german/

  Searched in:
    - PosixPath('/usr/share/nltk_data')
**********************************************************************
´´´

### Please provide detailed steps to reproduce the issue.

Update Paperless-ngx via the script from 2.11.4 to 2.11.6
tteck commented 2 months ago

Thank you for bringing this to our attention.

You need to re-download the NLTK data to complete the update to v2.11.6. After this, you won't need to do it for future upgrades. I'm not sure whether it would be better to add this step to the script or just mention the v2.11.6 release notes.

Alternatively, I could leave this issue open for everyone to read for a few days. I think that's what I'll do.