mjbright / futurelearn-dl

A script to download materials from the FutureLearn website (for enrolled courses)
GNU General Public License v3.0
34 stars 20 forks source link

FATAL:downloadURLInPage: Unhandled escape sequence in filename (how to sort this out?) #17

Open zenny opened 4 years ago

zenny commented 4 years ago

Hi,

I appreciate if anyone has any input to get over this specific issue, that is haunting me for a long time (also see #14):

$ ././futurelearn-dl.py EMAIL PASSWORD instructional-methods-in-health-professions-education 1
Downloading 8-week course 'instructional-methods-in-health-professions-education'
FATAL:downloadURLInPage: Unhandled escape sequence in filename <1.4-Weekly-Overview_Philosophy_of_Adult_Education_Inventory_%281%29.pdf>
Look for new files with - find /home/zenny/DoThis issue reappeared when the referred links has `()` symbols:
wnloads/Education/FUTURELEARN/instructional-methods-in-health-professions-education -type f -exec ls -altr {} \;

The link in question is:

https://pbea.agron.iastate.edu/files/Philosophy%20of%20Adult%20Education%20Inventory%20%281%29.pdf

I have manually downloaded the pdf file to the specific week 1 folder and also cp to multiple names like 1.4-Weekly-Overview_Philosophy_of_Adult_Education_Inventory_%281%29.pdf, yet no go!

Related code

A search in futurelearn-dl.py script leads to the following lines that is responsible for the fatal error: https://github.com/mjbright/futurelearn-dl/blob/3eb42fb5b257716e3a5a90292c1c5556827aa508/futurelearn-dl.py#L448-L449

How to sort this out in a python script when upstream link has % in their links?

Cheers and stay safe, /z

zenny commented 4 years ago

@mjbright I just commented the two lines with a # and it seemed to work fine but not without added issues as discussed below.

 # if '%' in filename: 
 #    fatal("downloadURLInPage: Unhandled escape sequence in filename <{}>".format(filename))

However, doing so fails to download any mp4 files, instead gets downloaded as m3u playlist (instructional-methods-in-health-professions-education/week1/8.2-Large-Lecture-Format_adaptive.m3u.mp4) without any media:

$ file 8.2-Large-Lecture-Format_adaptive.m3u.mp4
8.2-Large-Lecture-Format_adaptive.m3u.mp4: M3U playlist, ASCII text

While uncommented https://github.com/mjbright/futurelearn-dl/blob/3eb42fb5b257716e3a5a90292c1c5556827aa508/futurelearn-dl.py#L448-L449 again, the download dies with the same error with the same filename:

Downloading 8-week course 'instructional-methods-in-health-professions-education'
FATAL:downloadURLInPage: Unhandled escape sequence in filename <1.4-Weekly-Overview_Philosophy_of_Adult_Education_Inventory_%281%29.pdf>
Look for new files with - find /xtbmr/HOMEPOOL/HOME/zenny/Downloads/Education/FUTURELEARN/instructional-methods-in-health-professions-education -type f -exec ls -altr {} \;

Is there any parameters to skip a specific file with futurelearn-dl.py script? How to overcome this? Any pythonista may help. ;-)

Any inputs shall be highly appreciated.

Cheers.

zenny commented 4 years ago

Bump!