thombashi / pathvalidate

A Python library to sanitize/validate a string such as filenames/file-paths/etc.
https://pathvalidate.rtfd.io/
MIT License
210 stars 12 forks source link

Consider filesystem encoding for length calculations #26

Closed virlos closed 1 year ago

virlos commented 1 year ago

The maximum length for filenames/pathnames - e.g. 255/4096 for linux - is really a byte length constraint. Thus, given a typical scenario, where sys.getfilesystemencoding() returns utf-8, a path can pass validation that cannot be actually created in the system.

>>> import sys, pathvalidate as pv
>>> sys.getfilesystemencoding()
'utf-8'
>>> good = '\u2013' * 85
>>> long = '\u2013' * 86
>>> len(good), len(long)
(85, 86)
>>> pv.validate_filename(good)
>>> pv.validate_filename(long)
>>> pv.validate_filename('a' * 256)
Traceback (most recent call last):
...
pathvalidate.error.InvalidLengthError: filename is too long: expected<=255, actual=256, reason=INVALID_LENGTH

>>> with open(good, 'w') as handle:
...     handle.write('good')
... 
4
>>> with open(long, 'w') as handle:
...     handle.write('long')
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 36] File name too long: '––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––'
thombashi commented 1 year ago

Thank you for your report. The problem was fixed at pathvalidate 3.