scrapy / protego

A pure-Python robots.txt parser with support for modern conventions.
BSD 3-Clause "New" or "Revised" License
54 stars 28 forks source link

Colons in file names prevent installation on NTFS #13

Closed tjlaboss closed 3 years ago

tjlaboss commented 3 years ago

Installing protego 0.1.16 via Conda produced the following error:

InvalidArchiveError("Error with archive /home/share/conda/miniconda3/pkgs/protego-0.1.16-py_0.tar.bz2.  
You probably need to delete and re-download or re-create this file.  Message from libarchive was:\n\nCan't create 'info/test/tests/test_data/www.weather.info:443'")

It turned out the file protego-0.1.16-py_0.tar.bz2 couldn't be extracted on NTFS due to the colons in the file names:

tar -xjf protego-0.1.16-py_0.tar.bz2 
tar: info/test/tests/test_data/www.weather.info\:443: Cannot open: Invalid argument
tar: info/test/tests/test_data/www.bmf.gv.at\:443: Cannot open: Invalid argument
tar: info/test/tests/test_data/www.nd.edu\:443: Cannot open: Invalid argument
...
tar: info/test/tests/test_data/www.airarabia.com\:443: Cannot open: Invalid argument
tar: info/test/tests/test_data/www.broadcom.com\:443: Cannot open: Invalid argument
tar: info/test/tests/test_data/www.pakwheels.com\:443: Cannot open: Invalid argument
tar: Exiting with failure status due to previous errors

No issues were encountered in a Conda environment on an ext4 filesystem, on the same machine.

Gallaecio commented 3 years ago

Good find. I would simply replace it with _ instead, or we can go the Unicode route.

Gallaecio commented 3 years ago

Fixed by https://github.com/scrapy/protego/pull/14