mjbright / futurelearn-dl

A script to download materials from the FutureLearn website (for enrolled courses)
GNU General Public License v3.0
34 stars 20 forks source link

what am I doing wrong? #5

Open colablikje opened 7 years ago

colablikje commented 7 years ago

I downloaded futurelearn-dl.py to C:\Python27\Scripts From that folder I ran:

futurelearn-dl.py myemail mypw big-data-visualisation 2 (with myemail and mypw replaced by the actual email address and pw)

This yields the following error:

Traceback (most recent call last): File "C:\Python27\Scripts\futurelearn-dl.py", line 585, in OP_DIR = os.getenv('OP_DIR', default=os.getenv('HOME') + '/Education/FUTURELEARN') TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' <<<<<<<<<<<<<<<<<

What is going wrong? I am on Windows10, x64, Python 2.7.11 (and I am enrolled in this particular course). The same error pops up when running under Python 3.5.2.

Thank you.

mjbright commented 7 years ago

I didn't test on Windows, but if you have the HOME environment variable set it should work.

I think that's the problem.

I should really allow you to set the root download directly. I'll look into that when I have a moment.

In the meantime, if you can set your HOME directory that should do the trick (not sure if you will need forward or back-slashes in your path though).

On 3 January 2017 at 12:28, colablikje notifications@github.com wrote:

I downloaded futurelearn-dl.py to C:\Python27\Scripts From that folder I ran:

futurelearn-dl.py myemail mypw big-data-visualisation 2 (with myemail and mypw replaced by the actual email address and pw)

This yields the following error:

Traceback (most recent call last): File "C:\Python27\Scripts\futurelearn-dl.py", line 585, in OP_DIR = os.getenv('OP_DIR', default=os.getenv('HOME') + '/Education/FUTURELEARN') TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' <<<<<<<<<<<<<<<<<

What is going wrong? I am on Windows10, x64, Python 2.7.11 (and I am enrolled in this particular course). The same error pops up when running under Python 3.5.2.

Thank you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mjbright/futurelearn-dl/issues/5, or mute the thread https://github.com/notifications/unsubscribe-auth/ABywLd0EKw3KQp_ElXmBHQYVM4u6oQFmks5rOjDxgaJpZM4LZiXB .

colablikje commented 7 years ago

Thanks for your response. Unfortunately, it doesn't work. HOME was already set. It seems that something else is going wrong, unfortunately. Perhaps something specific to Windows? (perhaps: windows uses backslashes, so perhaps the current code adds forward slashes into the path?)

mjbright commented 7 years ago

Yes, it's probably a backslashes problem.

Looking at the code as it is, I see: TMP_DIR = os.getenv('TMP_DIR', default='/tmp/FUTURELEARN_DL') OP_DIR = os.getenv('OP_DIR', default=os.getenv('HOME') + '/Education/FUTURELEARN')

So you could set TMP_DIR and OS_DIR to acceptable paths. I'm not sure what format you'd need.

You could set both to '.' to start with, or try things like 'C:\tmp\FutureLean' and 'C:\FutureLean' respectively.

On 3 January 2017 at 23:09, colablikje notifications@github.com wrote:

Thanks for your response. Unfortunately, it doesn't work. HOME was already set. It seems that something else is going wrong, unfortunately. Perhaps something specific to Windows? (perhaps: windows uses backslashes, so perhaps the current code adds forward slashes into the path?)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mjbright/futurelearn-dl/issues/5#issuecomment-270239466, or mute the thread https://github.com/notifications/unsubscribe-auth/ABywLUNiKulsX6AdGYYEwWU1OPAPmzhkks5rOscjgaJpZM4LZiXB .

colablikje commented 7 years ago

Thanks, progress! This works, but some other error messages are now evoked. Let me copy the output here, can you see if I should make any further alterations to the code?

Overall, the html file and the (only few) videos are downloaded, so most (if not all) is there.

[C:\Python35\Scripts]futurelearn-dl.py myUN myPW big-data-visualisation 2 Downloading 2-week course 'big-data-visualisation' ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2003' in position 30609: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2003' in position 30609: character maps to Downloading urlhttps://view.vzaar.com/7051757/video to file <big-data-visualisation/week1/1.1-Welcome-to-the-course_7051757.mp4> ... type=mp4, content.len=8898117 ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 27757-27758: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 27757-27758: character maps to Downloading urlhttp://squidspot.com/Periodic_Table_of_Typefaces.html to file <big-data-visualisation/week1/1.12-Choosing-the-right-form-of-visualisation_Periodic_Table_of_Typefaces.html> ... type=html, content.len=19614 Downloading urlhttps://view.vzaar.com/7051801/video to file <big-data-visualisation/week1/1.16-Interactive-art_7051801.mp4> ... type=mp4, content.len=11625800 ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2003' in position 25988: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2003' in position 25988: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2211' in position 23368: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2211' in position 23368: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 30036-30037: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 30036-30037: character maps to Downloading urlhttps://view.vzaar.com/7087555/video to file <big-data-visualisation/week2/2.2-The-visualisation-process_7087555.mp4> ... type=mp4, content.len=7176196 Downloading urlhttp://www.htmlwidgets.org/showcase_plotly.html to file <big-data-visualisation/week2/2.7-Getting-started-with-MATLAB_showcase_plotly.html> ... type=html, content.len=60943 ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 25825-25826: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 25825-25826: character maps to Downloading urlhttp://www.htmlwidgets.org/showcase_plotly.html to file <big-data-visualisation/week2/2.17-Putting-it-all-together_showcase_plotly.html> ... type=html, content.len=60943 Downloading urlhttp://localhost:8080/australia.html to file <big-data-visualisation/week2/2.23-Create-an-interactive-map-using-D3.js_australia.html> ... Traceback (most recent call last): File "C:\Python35\lib\site-packages\requests\packages\urllib3\connection.py", line 142, in _new_conn (self.host, self.port), self.timeout, **extra_kw) File "C:\Python35\lib\site-packages\requests\packages\urllib3\util\connection.py", line 91, in create_connection raise err File "C:\Python35\lib\site-packages\requests\packages\urllib3\util\connection.py", line 81, in create_connection sock.connect(sa) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Python35\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 578, in urlopen chunked=chunked) File "C:\Python35\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 362, in _make_request conn.request(method, url, **httplib_request_kw) File "C:\Python35\lib\http\client.py", line 1106, in request self._send_request(method, url, body, headers) File "C:\Python35\lib\http\client.py", line 1151, in _send_request self.endheaders(body) File "C:\Python35\lib\http\client.py", line 1102, in endheaders self._send_output(message_body) File "C:\Python35\lib\http\client.py", line 934, in _send_output self.send(msg) File "C:\Python35\lib\http\client.py", line 877, in send self.connect() File "C:\Python35\lib\site-packages\requests\packages\urllib3\connection.py", line 167, in connect conn = self._new_conn() File "C:\Python35\lib\site-packages\requests\packages\urllib3\connection.py", line 151, in _new_conn self, "Failed to establish a new connection: %s" % e) requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x03DF9530>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target mac hine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Python35\lib\site-packages\requests\adapters.py", line 403, in send timeout=timeout File "C:\Python35\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 623, in urlopen _stacktrace=sys.exc_info()[2]) File "C:\Python35\lib\site-packages\requests\packages\urllib3\util\retry.py", line 281, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /australia.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x03DF9530>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Python35\Scripts\futurelearn-dl.py", line 627, in getCourseWeekStepPage(course_id, week_id, step_id, week_num, title) File "C:\Python35\Scripts\futurelearn-dl.py", line 232, in getCourseWeekStepPage downloadURLsInPage(course_id, week_id, step_id, week_num, content, DOWNLOAD_TYPE, page_title) File "C:\Python35\Scripts\futurelearn-dl.py", line 386, in downloadURLsInPage downloadURLInPage(url, download_dir, DOWNLOAD_TYPE, page_title) File "C:\Python35\Scripts\futurelearn-dl.py", line 452, in downloadURLInPage downloadURLToFile(url, ofile, DOWNLOAD_TYPE) File "C:\Python35\Scripts\futurelearn-dl.py", line 405, in downloadURLToFile response = session.get(url, headers=headers) File "C:\Python35\lib\site-packages\requests\sessions.py", line 487, in get return self.request('GET', url, kwargs) File "C:\Python35\lib\site-packages\requests\sessions.py", line 475, in request resp = self.send(prep, send_kwargs) File "C:\Python35\lib\site-packages\requests\sessions.py", line 585, in send r = adapter.send(request, **kwargs) File "C:\Python35\lib\site-packages\requests\adapters.py", line 467, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /australia.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x03DF9530> : Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',))

mjbright commented 7 years ago

I really don't have time to look into this now, but thanks for reporting back.

That's bad news if the remote machine is actively refusing ('No connection could be made because the target machine actively refused it') especially after a MaxRetryError. This was after a few downloads already - If you rerun I wonder does it fail immediately? Not sure if backing off and retrying later would help. ( exception catching and retrying the session.get after a sleep?)

For the Unicode errors, what a pain ... You might like to look at all content handling. In general there is response = session.get(url, headers=headers) followed by content = response.content.decode('utf8', 'ignore')

If the 'ignore' parameter isn't present it might help to add that ...
I never really grokked Unicode handling ...

On 4 January 2017 at 11:58, colablikje notifications@github.com wrote:

Thanks, progress! This works, but some other error messages are now evoked. Let me copy the output here, can you see if I should make any further alterations to the code?

Overall, the html file and the (only few) videos are downloaded, so most (if not all) is there.

[C:\Python35\Scripts]futurelearn-dl.py myUN myPW big-data-visualisation 2 Downloading 2-week course 'big-data-visualisation' ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2003' in position 30609: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2003' in position 30609: character maps to Downloading urlhttps://view.vzaar.com/7051757/video to file <big-data-visualisation/week1/1.1-Welcome-to-the-course_7051757.mp4> ... type=mp4, content.len=8898117 ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 27757-27758: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 27757-27758: character maps to Downloading urlhttp://squidspot.com/Periodic_Table_of_Typefaces.html to file <big-data-visualisation/week1/1.12-Choosing-the-right-form- of-visualisation_Periodic_Table_of_Typefaces.html> ... type=html, content.len=19614 Downloading urlhttps://view.vzaar.com/7051801/video to file <big-data-visualisation/week1/1.16-Interactive-art_7051801.mp4> ... type=mp4, content.len=11625800 ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2003' in position 25988: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2003' in position 25988: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2211' in position 23368: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode character '\u2211' in position 23368: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 30036-30037: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 30036-30037: character maps to Downloading urlhttps://view.vzaar.com/7087555/video to file <big-data-visualisation/week2/2.2-The-visualisation-process_7087555.mp4> ... type=mp4, content.len=7176196 Downloading urlhttp://www.htmlwidgets.org/showcase_plotly.html to file <big-data-visualisation/week2/2.7-Getting-started-with-MATLAB_showcase_plotly.html> ... type=html, content.len=60943 ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 25825-25826: character maps to ERROR: UnicodeEncodeError - 'charmap' codec can't encode characters in position 25825-25826: character maps to Downloading urlhttp://www.htmlwidgets.org/showcase_plotly.html to file <big-data-visualisation/week2/2.17-Putting-it-all-together_showcase_plotly.html> ... type=html, content.len=60943 Downloading urlhttp://localhost:8080/australia.html to file <big-data-visualisation/week2/2.23-Create-an-interactive- map-using-D3.js_australia.html> ... Traceback (most recent call last): File "C:\Python35\lib\site-packages\requests\packages\urllib3\connection.py", line 142, in _new_conn (self.host, self.port), self.timeout, **extra_kw) File "C:\Python35\lib\site-packages\requests\packages\urllib3\util\connection.py", line 91, in create_connection raise err File "C:\Python35\lib\site-packages\requests\packages\urllib3\util\connection.py", line 81, in create_connection sock.connect(sa) ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Python35\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 578, in urlopen chunked=chunked) File "C:\Python35\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 362, in _make_request conn.request(method, url, **httplib_request_kw) File "C:\Python35\lib\http\client.py", line 1106, in request self._send_request(method, url, body, headers) File "C:\Python35\lib\http\client.py", line 1151, in _send_request self.endheaders(body) File "C:\Python35\lib\http\client.py", line 1102, in endheaders self._send_output(message_body) File "C:\Python35\lib\http\client.py", line 934, in _send_output self.send(msg) File "C:\Python35\lib\http\client.py", line 877, in send self.connect() File "C:\Python35\lib\site-packages\requests\packages\urllib3\connection.py", line 167, in connect conn = self._new_conn() File "C:\Python35\lib\site-packages\requests\packages\urllib3\connection.py", line 151, in _new_conn self, "Failed to establish a new connection: %s" % e) requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x03DF9530>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target mac hine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Python35\lib\site-packages\requests\adapters.py", line 403, in send timeout=timeout File "C:\Python35\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 623, in urlopen _stacktrace=sys.exc_info()[2]) File "C:\Python35\lib\site-packages\requests\packages\urllib3\util\retry.py", line 281, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /australia.html (Caused by NewConnectionError('<requests. packages.urllib3.connection.HTTPConnection object at 0x03DF9530>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Python35\Scripts\futurelearn-dl.py", line 627, in getCourseWeekStepPage(course_id, week_id, step_id, week_num, title) File "C:\Python35\Scripts\futurelearn-dl.py", line 232, in getCourseWeekStepPage downloadURLsInPage(course_id, week_id, step_id, week_num, content, DOWNLOAD_TYPE, page_title) File "C:\Python35\Scripts\futurelearn-dl.py", line 386, in downloadURLsInPage downloadURLInPage(url, download_dir, DOWNLOAD_TYPE, page_title) File "C:\Python35\Scripts\futurelearn-dl.py", line 452, in downloadURLInPage downloadURLToFile(url, ofile, DOWNLOAD_TYPE) File "C:\Python35\Scripts\futurelearn-dl.py", line 405, in downloadURLToFile response = session.get(url, headers=headers) File "C:\Python35\lib\site-packages\requests\sessions.py", line 487, in get return self.request('GET', url, kwargs) File "C:\Python35\lib\site-packages\requests\sessions.py", line 475, in request resp = self.send(prep, send_kwargs) File "C:\Python35\lib\site-packages\requests\sessions.py", line 585, in send r = adapter.send(request, **kwargs) File "C:\Python35\lib\site-packages\requests\adapters.py", line 467, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /australia.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x03DF9530> : Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',))

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mjbright/futurelearn-dl/issues/5#issuecomment-270345680, or mute the thread https://github.com/notifications/unsubscribe-auth/ABywLTax2JE2xk-m_fO-OJa9wFWeRerJks5rO3tBgaJpZM4LZiXB .

colablikje commented 7 years ago

I'll patiently wait until you find the time to solve this issue. Let me know if you need me to try anything specific.

Update: In the meantime, I have tried downloading other courses as well. They all work, but the unicode errors remain. The original course (big-data-visualisation, v2) above still gives the original errors, so it appears that there is something specific with that course.