jjjake / internetarchive

A Python and Command-Line Interface to Archive.org
GNU Affero General Public License v3.0
1.58k stars 217 forks source link

Downloads Fail When IA Attempts To Create Folder Ending With Dots #330

Open jdl96 opened 4 years ago

jdl96 commented 4 years ago

Hello,

I actually emailed Jake about this issue a few days ago, although at that point I had no idea what the problem was. I think I have figured it out so I might as well just post it here instead.

If IA attempts to download files from an item, and those files are located in a folder whose name ends with one or more dots, then the files will fail to download. The folder will be created, however its name will not have the ending dot(s), which is what I assume is causing the problem. This is because Windows automatically renames folders that contain dots at the end of them for some reason.

Currently I am running Windows 7, Python 3.8.2, and InternetArchive Version 1.9.0.

An example command I am running in Command Prompt to produce the problem is the following: ia download raocow_Archive "raocow/20081009_Vip2- Meadow, Iggy's Castle and... uhm.../Vip2- Meadow, Iggy's Castle and... uhm....jpg"

When logging the command, I receive the following error message: 2020-03-12 14:04:57,508 - internetarchive.files - ERROR - error downloading file raocow_Archive\raocow/20081009_Vip2- Meadow, Iggy's Castle and... uhm.../Vip2- Meadow, Iggy's Castle and... uhm....jpg, exception raised: [Errno 2] No such file or directory: b"raocow_Archive\\raocow/20081009_Vip2- Meadow, Iggy's Castle and... uhm.../Vip2- Meadow, Iggy's Castle and... uhm....jpg"

Here's the traceback when running the same command in python instead:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packag
es\internetarchive\api.py", line 379, in download
    r = item.download(files=files,
  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packag
es\internetarchive\item.py", line 546, in download
    r = f.download(path, verbose, silent, ignore_existing, checksum, destdir,
  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packag
es\internetarchive\files.py", line 294, in download
    raise exc
  File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\site-packag
es\internetarchive\files.py", line 272, in download
    fileobj = open(file_path.encode('utf-8'), 'wb')
FileNotFoundError: [Errno 2] No such file or directory: b"raocow_Archive\\raocow
/20081009_Vip2- Meadow, Iggy's Castle and... uhm.../Vip2- Meadow, Iggy's Castle
and... uhm....jpg"
jjjake commented 4 years ago

This works for me, FWIW:

» ia download raocow_Archive "raocow/20081009_Vip2- Meadow, Iggy's Castle and... uhm.../Vip2- Meadow, Iggy's Castle and... uhm....jpg"
raocow_Archive: d - success

I'm guessing this is a Windows issue. I don't have a Windows machine available at the moment to trouble shoot, but I'll see what I can do about resolving this.

Thanks for the report!

jdl96 commented 4 years ago

Just temporarily installed Ubuntu on my laptop today (first time using something that isn't Windows so I have no idea what I'm doing) and I tried using the same example as before. It actually worked this time, so yes this seems to be a Windows issue. Any news on when this may be fixed?

Thanks a bunch for all the hard work you put into this useful tool!

EDIT: After more thorough testing, this time with Ubuntu in a Virtual Machine, the issue still persists when attempting to download the entire item that was mentioned previously, instead of just that single image file.

mjturner commented 3 years ago

Just as a data point, doing a os.mkdir("d:/temp/testing123...") in a Windows 10 VM creates d:/temp/testing123, ie the trailing ... is ignored. I'm not sure if this is a Windows API behaviour as trying the same via Explorer or Command Prompt does the same thing?

If no-one else is looking at this, I'd be happy to take a stab at a fix.

jjjake commented 3 years ago

That'd be really helpful, thanks for the offer @mjturner! I don't believe anybody is working on resolving this at the moment.

mjturner commented 3 years ago

No problem @jjjake. I have whipped up an initial patch, just need to test it on a Windows system (bit painful as I never do development on Windows). It only handles the trailing dot case (with special case handling of . and ..), not sure if other characters are silently dropped by mkdir on Windows?

Will try and get the testing done in the next few days and will then submit a PR for review.

jdl96 commented 3 years ago

@mjturner Any update on your patch for this issue?