Error uploading with ASCII metadatas on PDF

I tried to upload a PDF which had metadatas in ASCII.

That resulted in a 500 error.

Environment (please complete the following information):

OS: Debian 11
Python version: 3.10
Calibre-Web version: 0.6.21
Docker container: LinuxServer
Special Hardware: Odroid HC-4
Browser: N/A

Additional context

I'm discovering calibre-web and I'm not sure about the dev philosophy yet. So no PR request from me, I have however debug the issue.

Cause : no sanitization on title when processing pdf metadatas, no check on type for title and then a call to value.replace in helper.get_valid_filename which result in an error

Log :

[2024-03-22 17:23:01,817] DEBUG {cps.editbooks:665} b'\x00N\x00a\x00i\x00n\x00s\x00 \x00-\x00 \x00T\x002\x003\x00 \x00-\x00 \x00A\x00r\x00a\x00r\x00u\x00n\x00 \x00e\x00t\x00 \x00l\x00a\x00 \x00r\x00a\x00g\x00e\x00 \x00b\x00l\x00e\x00u\x00e\x00\x00' [2024-03-22 17:23:01,826] ERROR {cps:1414} Exception on /upload [POST] Traceback (most recent call last): File "/lsiopy/lib/python3.10/site-packages/flask/app.py", line 2190, in wsgi_app response = self.full_dispatch_request() File "/lsiopy/lib/python3.10/site-packages/flask/app.py", line 1486, in full_dispatch_request rv = self.handle_user_exception(e) File "/lsiopy/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request rv = self.dispatch_request() File "/lsiopy/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(view_args) File "/app/calibre-web/cps/usermanagement.py", line 35, in decorated_view return login_required(func)(*args, *kwargs) File "/lsiopy/lib/python3.10/site-packages/flask_login/utils.py", line 290, in decorated_view return current_app.ensure_sync(func)(args, kwargs) File "/app/calibre-web/cps/editbooks.py", line 62, in inner return f(*args, **kwargs) File "/app/calibre-web/cps/editbooks.py", line 257, in upload db_book, input_authors, title_dir, renamed_authors = create_book_on_upload(modify_date, meta) File "/app/calibre-web/cps/editbooks.py", line 669, in create_book_on_upload author_dir = helper.get_valid_filename(db_author.name, chars=96) File "/app/calibre-web/cps/helper.py", line 237, in get_validfilename value = value.replace("/", "").replace(":", "_").strip('\0') TypeError: a bytes-like object is required, not 'str'

Suggested fix :

I can see at least 3 (quick) solutions and cannot decide which one is the best fix according to the developpement philosophy. Imho the best one would be a new metadata_sanitization function called on all metadatas whatever the file type might be to ensure validity (type, charset, content and so on) but that's a bit too much code to write in a ticket

First one

cps/helper.py

235. + value = value.decode('utf-8') if isinstance(value, bytes) else value

Second one :

cps/uploader.py

192. - title = doc_info.title if doc_info.title else original_file_name 192. + title = doc_info.title if doc_info.title and isinstance(doc_info.title, str) else original_file_name

Third one :

cps/uploader.py

193. + if isinstance(title, bytes): 194. + title = title.decode('utf-8')

Edit:

Solution 1 does not work as it fail while recording into database. Which raise the question of the other datas grabed from metadatas (author and subject). Tags work fine as the case is managed

janeczku / calibre-web

Error uploading with ASCII metadatas on PDF #3026