rojter-tech / pluradl.py

Automated download of Pluralsight courses
MIT License
2 stars 0 forks source link

Frequent loss of downloaded files #36

Closed Taranis01 closed 4 years ago

Taranis01 commented 4 years ago

When plura-dl encounters an problem for what ever reason the current downloads are moved to the _failed folder. Then i move them back to _inprogress folder and restart plura-dl. However sometimes files are lost:

I will give furter information if i can replicate the first two issues

[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252 [debug] plura-dl version 1.0.0b4 [debug] Python version 3.7.7 (CPython) - Windows-10-10.0.18362-SP0 [debug] exe versions: none [debug] Proxy map: {} [pluralsight:course] django-angularjs-web-development: Downloading JSON metadata [pluralsight:course] django-angularjs-web-development: Downloading JSON metadata ERROR: Unable to download JSON metadata: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on https://github.com/rojter-tech/pluradl.py/issues . Make sure you are using the latest version; see https://github.com/rojter-tech/pluradl.py/wiki on how to update. Be sure to call plura-dl with the --verbose flag and include its complete output. File "C:\Users\Test\Desktop\Python\pluradl4.py\plura_dl\extractor\common.py", line 627, in _request_webpage return self._downloader.urlopen(url_or_request) File "C:\Users\Test\Desktop\Python\pluradl4.py\plura_dl\PluraDL.py", line 2238, in urlopen return self._opener.open(req, timeout=self._socket_timeout) File "C:\Users\Test\Anaconda3\envs\Test1\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\Test\Anaconda3\envs\Test1\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\Test\Anaconda3\envs\Test1\lib\urllib\request.py", line 569, in error return self._call_chain(args) File "C:\Users\Test\Anaconda3\envs\Test1\lib\urllib\request.py", line 503, in _call_chain result = func(args) File "C:\Users\Test\Anaconda3\envs\Test1\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp)

Something went wrong. The download request for 'django-angularjs-web-development' was forced to terminate. Double check that https://app.pluralsight.com/library/courses/django-angularjs-web-development exists or that your subscription is valid for accessing its content.

prikhi commented 4 years ago

I noticed this too. It happens if you copy more than just the next course into the _inprogress folder. I made these quick tweaks to fix it:

@@ -90,11 +95,12 @@ def move_content(pdl, course_id, coursepath, completionpath):
     pdl.to_stdout("Moving content to " + finalpath)
     set_directory(completionpath)
     try:
-        if os.path.exists(finalpath):
-            shutil.rmtree(finalpath)
-        shutil.move(coursepath,finalpath)
-        if os.path.exists(INPROGRESSPATH):
-            shutil.rmtree(INPROGRESSPATH)
+        os.makedirs(finalpath)
+        for f in os.listdir(coursepath):
+            if os.path.exists(finalpath + "/" + f):
+                shutil.rmtree(finalpath+"/"+f)
+            shutil.move(coursepath + "/" + f,finalpath + "/" + f)
+        shutil.rmtree(coursepath)
     except PermissionError:
         print("Directory still in use, leaving it. Will be fixed in future releases.")

I also commented out every move_content call except for finished downloads in the invoke_download function to keep everything in the _inprogress folder until it finished. I have meh internet & have to restart the script every time it goes out for a couple seconds and this makes it less work for me.

Taranis01 commented 4 years ago

Thanks @prikhi. I had running multiple instances with the same course folder (symbolic links), so no wonder this happened frequently

treeshateorcs commented 4 years ago

so i just lost a week's worth of downloaded courses, because i moved all courses that it (incompletely) downloaded into _inprogress. i mean what's even the point in these directories? _cancelled, _failed, why are they even there? i just want to download, not sort them based on some nonsense. everytime i ran it i had to move _failed and _cancelled into _inprogress. i wish i had not found this software