thombashi / pathvalidate

A Python library to sanitize/validate a string such as filenames/file-paths/etc.
https://pathvalidate.rtfd.io/
MIT License
220 stars 13 forks source link

sanitize_filepath does not produce valid length filepaths #49

Open 7x11x13 opened 3 months ago

7x11x13 commented 3 months ago

Code to reproduce:

from pathvalidate import sanitize_filepath, validate_filepath

path = "/".join("a"*200 for _ in range(30))
print("length:", len(path))
sanitized = sanitize_filepath(path)
print("sanitized length:", len(sanitized))
validate_filepath(sanitized)

Output:

length: 6029
sanitized length: 6029
Traceback (most recent call last):
  ...
pathvalidate.error.ValidationError: [PV1101] found an invalid string length: file path is too long: expected<=260 bytes, actual=6029 bytes, platform=universal, fs_encoding=utf-8, byte_count=6,029

To be honest I'm not sure what the expected behavior should be, but I think at the least it should throw an error if it can't successfully sanitize the filepath.

7x11x13 commented 3 months ago

After some thought I think the default behavior should be to truncate the last component of the path so the entire path length falls within the accepted range, and if this is not possible, it should throw an error.

thombashi commented 2 months ago

Thank you for your feedback. I have considered this in the past. My conclusion at the time was that it can be difficult to sanitize the length of the filepath, so I just added the validate_after_sanitize argument.

For example, if the maximum length of the filepath is 260 bytes and the directory path length is 259 bytes, even if the filename is 200 bytes long, it must be truncated to 1 byte. This means that most of the filename is lost, it would not be what users would expect in many cases. That is why I do not truncate the filepath length in sanitize_filepath. However, when validate_after_sanitize arg is True, an exception is raised in such cases.

The default value of validate_after_sanitize is False for now to keep backward compatibility. It would be good to set the default value to True in pathvalidate v4.