I tried to upload a PDF which had metadatas in ASCII.
That resulted in a 500 error.
Environment (please complete the following information):
OS: Debian 11
Python version: 3.10
Calibre-Web version: 0.6.21
Docker container: LinuxServer
Special Hardware: Odroid HC-4
Browser: N/A
Additional context
I'm discovering calibre-web and I'm not sure about the dev philosophy yet. So no PR request from me, I have however debug the issue.
Cause : no sanitization on title when processing pdf metadatas, no check on type for title and then a call to value.replace in helper.get_valid_filename which result in an error
Log :
[2024-03-22 17:23:01,817] DEBUG {cps.editbooks:665} b'\x00N\x00a\x00i\x00n\x00s\x00 \x00-\x00 \x00T\x002\x003\x00 \x00-\x00 \x00A\x00r\x00a\x00r\x00u\x00n\x00 \x00e\x00t\x00 \x00l\x00a\x00 \x00r\x00a\x00g\x00e\x00 \x00b\x00l\x00e\x00u\x00e\x00\x00'
[2024-03-22 17:23:01,826] ERROR {cps:1414} Exception on /upload [POST]
Traceback (most recent call last):
File "/lsiopy/lib/python3.10/site-packages/flask/app.py", line 2190, in wsgi_app
response = self.full_dispatch_request()
File "/lsiopy/lib/python3.10/site-packages/flask/app.py", line 1486, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/lsiopy/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
File "/lsiopy/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(view_args)
File "/app/calibre-web/cps/usermanagement.py", line 35, in decorated_view
return login_required(func)(*args, *kwargs)
File "/lsiopy/lib/python3.10/site-packages/flask_login/utils.py", line 290, in decorated_view
return current_app.ensure_sync(func)(args, kwargs)
File "/app/calibre-web/cps/editbooks.py", line 62, in inner
return f(*args, **kwargs)
File "/app/calibre-web/cps/editbooks.py", line 257, in upload
db_book, input_authors, title_dir, renamed_authors = create_book_on_upload(modify_date, meta)
File "/app/calibre-web/cps/editbooks.py", line 669, in create_book_on_upload
author_dir = helper.get_valid_filename(db_author.name, chars=96)
File "/app/calibre-web/cps/helper.py", line 237, in get_validfilename
value = value.replace("/", "").replace(":", "_").strip('\0')
TypeError: a bytes-like object is required, not 'str'
Suggested fix :
I can see at least 3 (quick) solutions and cannot decide which one is the best fix according to the developpement philosophy. Imho the best one would be a new metadata_sanitization function called on all metadatas whatever the file type might be to ensure validity (type, charset, content and so on) but that's a bit too much code to write in a ticket
First one
cps/helper.py
235. + value = value.decode('utf-8') if isinstance(value, bytes) else value
Second one :
cps/uploader.py
192. - title = doc_info.title if doc_info.title else original_file_name192. + title = doc_info.title if doc_info.title and isinstance(doc_info.title, str) else original_file_name
Third one :
cps/uploader.py
193. + if isinstance(title, bytes):194. + title = title.decode('utf-8')
Edit:
Solution 1 does not work as it fail while recording into database. Which raise the question of the other datas grabed from metadatas (author and subject). Tags work fine as the case is managed
Please provide the book in question (via private email address from my profile). I'd like to check the problem by myself, before doing some change on the code
I tried to upload a PDF which had metadatas in ASCII.
That resulted in a 500 error.
Environment (please complete the following information):
Additional context
I'm discovering calibre-web and I'm not sure about the dev philosophy yet. So no PR request from me, I have however debug the issue.
Cause : no sanitization on title when processing pdf metadatas, no check on type for title and then a call to value.replace in helper.get_valid_filename which result in an error
Log :
Suggested fix :
I can see at least 3 (quick) solutions and cannot decide which one is the best fix according to the developpement philosophy. Imho the best one would be a new metadata_sanitization function called on all metadatas whatever the file type might be to ensure validity (type, charset, content and so on) but that's a bit too much code to write in a ticket
First one
cps/helper.py
235. + value = value.decode('utf-8') if isinstance(value, bytes) else value
Second one :
cps/uploader.py
192. - title = doc_info.title if doc_info.title else original_file_name
192. + title = doc_info.title if doc_info.title and isinstance(doc_info.title, str) else original_file_name
Third one :
cps/uploader.py
193. + if isinstance(title, bytes):
194. + title = title.decode('utf-8')
Edit:
Solution 1 does not work as it fail while recording into database. Which raise the question of the other datas grabed from metadatas (author and subject). Tags work fine as the case is managed