File not found after try to load metadata from amazon, where amazon server responds error 503

Valkryst commented 1 year ago

Is your feature request related to a problem? Please describe.

I have a handful of excessively large PDF files (500MB to 5.5GB) which I would like to store and access with calibre-web.

While attempting to upload these files, I ran into a number of issues which I assumed were caused by Nginx. After combing through the logs, changing reverse-proxy settings, and troubleshooting as much as I could, I realized the issue was with calibre-web. There is a hardcoded _max_buffersize which is around 200MB and which prevents larger files from being uploaded.

Though this is entirely reasonable for 99% of the files that I've uploaded thus far, it's preventing me from uploading these files.

Describe the solution you'd like

I would like a new option to be added to the admin UI, allowing the user to select a higher max_buffer_size, and for any relevant code to be updated to work with larger files.

Describe alternatives you've considered

I have manually edited the hardcoded _max_buffersize to allow for files up to 6GB, but it seems as though there are still issues with files around 800MB and up.

The upload will appear to succeed, but any changes to the book's metadata will fail to save. It appears as though the file can't be found in the expected location, and calibre-web silently fails.

I could also manually upload the files VIA FTP, if there were instructions on how to add them to calibre-web.

OzzieIsaacs commented 1 year ago

Please install gevent from the optional requirements file and the limit is gone

Valkryst commented 1 year ago

Thanks for responding so quickly!

I gave it a shot, but I may have done something incorrectly as the issue persists. Here are the steps that I followed:

cd into the folder that I usually run cps from.
pip install gevent
Run cps and attempt to upload a large (~800MB) file.

If it helps, here is some additional information:

I'm running calibre-web on Ubuntu 22.04.2 LTS, under it's own user account.
It was installed with pip install calibreweb calibreweb[metadata] calibreweb[comics].
I'm running the latest nightly version.

OzzieIsaacs commented 1 year ago

You need to install gevent into the same virtual environment you are installed calibre-web into. This could be one problem. And of course you need to restart calibre-web afterwards. You can check if gevent is used in the about section. Gevent should be listed there with a version number if it is recocknized

Valkryst commented 1 year ago

Looks like it's present in the About section, listed as gevent 22.10.2. I'm wondering if it might be worth doing a quick re-install to ensure everything's clean and up to date?

OzzieIsaacs commented 1 year ago

Normally this should not be necessary

OzzieIsaacs commented 1 year ago

Okay, I didn't fully read the post. If the upload succeeds than gevent is not the problem, the metadata save problems points towards a permission problem (this should be independent of the upload file size). Just to make sure: You are NOT using docker.If you are using docker it's a permission problem independent of what you think the problem is. Otherwise, please enable debug level logging, and try to change the metadata of a file. (Uploading in this case should not have anything to do with it). Do you see the covers of the files? If you change the "series" of a book does this also not work? (This is only related to metadata.db and nothing more) Does changing the title result in a no longer visible cover?

Valkryst commented 1 year ago

To confirm, I'm not using Docker. calibre-web is installed directly on the system, under its own user account, using pip 👌

I'm glad we had the same thought, I just finished running a test with the debug logs enabled:

[2023-03-15 17:18:02,853] DEBUG {cps.uploader:261} Temporary file: /tmp/calibre_web/f02abd0047ada6536f58c3860c819b85
[2023-03-15 17:18:03,850] DEBUG {cps.uploader:170} Can not read PDF DocumentInfo 'PdfFileReader' object has no attribute 'metadata'
[2023-03-15 17:18:03,850] DEBUG {cps.uploader:122} Can not read PDF XMP metadata 'PdfFileReader' object has no attribute 'xmp_metadata'
[2023-03-15 17:18:03,851]  WARN {cps.uploader:234} Pdf extraction forbidden by Imagemagick policy: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/421
[2023-03-15 17:18:03,856] DEBUG {cps.helper:547} Moving title: /tmp/calibre_web/f02abd0047ada6536f58c3860c819b85 to /home/calibre/library/Unknown/completeelfquestvolume7 (1045)/completeelfquestvolume7 - Unknown
[2023-03-15 17:18:04,026] DEBUG {cps.services.worker:91} Add Task for user: Valkryst - Upload completeelfquestvolume7
[2023-03-15 17:18:04,028] DEBUG {cps.services.worker:91} Add Task for user: System - Add Cover Thumbnails for Book 1045
[2023-03-15 17:18:22,813] ERROR {cps.metadata_provider.amazon:129} 503 Server Error: Service Unavailable for url: https://www.amazon.com/s?k=completeelfquestvolume7&i=digital-text&sprefix=completeelfquestvolume7%2Cdigital-text&ref=nb_sb_noss
[2023-03-15 17:18:26,766] DEBUG {cps.helper:552} Moving title: /home/calibre/library/Unknown/completeelfquestvolume7 (1045) to /home/calibre/library/Richard Pini/The Complete ElfQuest Volume 7 (1045)
[2023-03-15 17:18:27,339] DEBUG {cps.services.worker:91} Add Task for user: System - Replace/Delete Cover Thumbnails for book 1045
[2023-03-15 17:18:27,342] DEBUG {cps.services.worker:91} Add Task for user: System - Add Cover Thumbnails for Book 1045
[2023-03-15 17:18:40,287] DEBUG {cps.helper:552} Moving title: /home/calibre/library/Unknown/completeelfquestvolume7 (1045) to /home/calibre/library/Unknown/The Complete ElfQuest - Volume 7 (1045)
[2023-03-15 17:18:40,288] ERROR {cps.helper:564} Rename title from /home/calibre/library/Unknown/completeelfquestvolume7 (1045) to /home/calibre/library/Unknown/The Complete ElfQuest - Volume 7 (1045) failed with error: [Errno 2] No such file or directory: '/home/calibre/library/Unknown/completeelfquestvolume7 (1045)'

Some additional notes:

When using fetched metadata, the cover is not applied.
When uploading a cover from my local disk, the cover is applied.
I am able to change the series.
Changing the title does clear the title. (i.e. it displays "Cover not available"
When downloading the file, a 404 page is displayed.
When running ls -la on various folders, I don't see any obvious issues with file/folder ownership.

OzzieIsaacs commented 1 year ago

The error it this one: [2023-03-15 17:18:40,288] ERROR {cps.helper:564} Rename title from /home/calibre/library/Unknown/completeelfquestvolume7 (1045) to /home/calibre/library/Unknown/The Complete ElfQuest - Volume 7 (1045) failed with error: [Errno 2] No such file or directory: '/home/calibre/library/Unknown/completeelfquestvolume7 (1045) For whatever reason the folder /home/calibre/library/Unknown/completeelfquestvolume7 (1045) could not be found, could you please check if the file is there. Once you tried to rename the file and the cover is gone, the file/folder is lost to calibre-web (file structure on your harddisk and information in the metadata.db file is not matching any more) I recommend in this case to use a new file for experiments.

This is caused by this one: {cps.metadata_provider.amazon:129} 503 Server Error: Service Unavailable for url: https://www.amazon.com/s?k=completeelfquestvolume7&i=digital-text&sprefix=completeelfquestvolume7%2Cdigital-text&ref=nb_sb_noss Sometimes amazon is a bit picky, you might tried to download to many files so they blocked your ip for a while.

The metadata from the file could not be extracted because of this: {cps.uploader:234} Pdf extraction forbidden by Imagemagick policy: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/421 Here is a solution for this: https://github.com/janeczku/calibre-web/wiki/FAQ#what-to-do-if-cover-pictures-are-not-extracted-from-pdf-files

When downloading the file, a 404 page is displayed.

Before or after renaming the title/author? If before then calibre-web can't access the file.

Could you try to change file and folder permissions all to "777". Calibre-web is run by user "calibre"?

OzzieIsaacs commented 1 year ago

Could you please do the following: Upload a cover to book "X", make sure the cover is visible in calibre-web. Please check if the folder of book "X" has a file named cover.jpg. The cover.jpg is the cover you uploaded. Please check that cover.jpg and the book file for book "X" are having the exact same owner, group and permissions Please try to download Book "X" file

OzzieIsaacs commented 1 year ago

And please do all experiments without nginx in between, so we are sure that not something else is also causing trouble

Valkryst commented 1 year ago

It seems as though the issue is caused by attempting to apply metadata from Amazon, while they have my IP blocked.

And please do all experiments without nginx in between

I'm not sure if this is possible. It's running on a headless, remote server, so the only way (that I know of) to access the UI is VIA the Nginx reverse proxy 😓

Open to trying any suggestions that you might have though

When downloading the file, a 404 page is displayed.

Before or after renaming the title/author?

I just re-tested this to be sure. It happens only after renaming the title

Calibre-web is run by user "calibre"?

Yup!

For whatever reason the folder /home/calibre/library/Unknown/completeelfquestvolume7 (1045) could not be found, could you please check if the file is there.

I verified that the folder does not exist

The metadata from the file could not be extracted because of this:

I applied your fix and the Imagemagick issues disappeared, thanks!

I ran a test with the following steps:

Uploaded a new ~800MB PDF file. I did not change the metadata at all.
Verified that the book has a folder in ./library/Unknown and that the folder contains cover.jpg and the .pdf file.
Uploaded a cover from my local disk.
Verified that cover.jpg is visible in calibre-web.
Verified that cover.jpg is present in the ./library/Unknown/somerandombook (1046) folder.
Verified that cover.jpg has the correct group and owner permissions.

I ran another test with the following steps:

Uploaded a new ~800MB PDF file.
Fetched metadata, but not from Amazon.
Applied all metadata changes.
Verified that all metadata was applied, including the cover.
Verified that the download worked.

I ran another test with the following steps:

Uploaded the original ~800MB PDF.
Fetched metatata, but not from Amazon.
Applied all metadata changes.
Verified that all metadata was applied, including the cover.
Verified that the download worked.

OzzieIsaacs commented 1 year ago

Okay so the problem is: "File not found after try to load metadata from amazon, where amazon server responds error 503" I will change the title of the issue accordingly and have a look for it

Valkryst commented 1 year ago

Sounds good, and thank you for all of your help in debugging this. Especially with how quickly you responded to it!

ptsteadman commented 1 year ago

I'm also having this issue. Server is hanging or crashing after doing a metadata search. I am on version 0.6.20. Given that the metadata providers seem to often have issues (which makes sense given that they're fetching data from third-party services), besides trying to fix the crashing issues, perhaps it'd make sense to make it easily possible to disable fetching metadata from given providers?

GooRoo commented 3 months ago

My server doesn't hang or crash, but the files are moved indeed while the database hasn't got updated.

janeczku / calibre-web