Closed travis-south closed 4 years ago
@travis-south i will fix the issue after reproducing
I'm having this same issue. Seems related to the number prefixing.
Maybe prefix with the format [chapter]-[auto increment]?
I'm having this same issue. Seems related to the number prefixing.
Maybe prefix with the format [chapter]-[auto increment]?
This idea seems good to me will check thanks for the suggestion.
Or just number every lesson per chapter from 001 again
what I did is use a script to download and remove duplicates
PATH_OF_DATA = "/Volumes/PenDriveWithCourses/"
CMD_PRE = "/usr/local/bin/python3 /Users/YourWsp/Documents/GIT/udemy-dl/udemy-dl.py https://yourcompanysthing.udemy.com/"
CMD_POST = "/ -k /Users/YourWsp/Documents/cookies.txt --skip-sub -q 480 -o "
CMD_CLEANUP = "fdupes -N -i -r -d "
# Go thru the drive and iterate thru the downloaded folder list
for foldername in os.listdir(PATH_OF_DATA):
print("+++++++++++++++++++++++++++ Directory is " + foldername + " ++++++++++++++++++++++++++++++++++++++")
CMD = CMD_PRE + foldername + CMD_POST + PATH_OF_DATA
os.system(CMD)
# Remove duplicates
CMD_NEXT = CMD_CLEANUP + PATH_OF_DATA + foldername + "/"
os.system(CMD_NEXT)
Once a couple of months I check for updates and delete the old (duplicate) content
I have another solution to it:
in _shared.py use this function to check if the file is already downloaded :
def check_if_already_exists(filepath):
dirname = os.path.dirname(filepath)
dlfilename = os.path.basename(filepath)
shortened_filename = dlfilename[4:]
for filename in os.listdir(dirname):
local_filename = filename[4:]
if local_filename == shortened_filename:
return True
print("Not found : " + shortened_filename + ", Downloading ...")
return False
Usage : Just before downloading it check if the file actually exists
+ if check_if_already_exists(filepath):
+ retVal = {"status" : "True", "msg" : "already downloaded, with a different name"}
+ return retVal
if os.path.isfile(filepath):
retVal = {"status": "True", "msg": "already downloaded"}
return retVal
The use the fdupes tool to delete dupes
fdupes -N -i -r -d /path/to/files
It's lame but it works. The right way would be to compare the file sizes as well (not just the name)
well, your solution has issues, a teacher can reupload a new file under old name, and you will miss it. From what i have seen, udemy currently doesnt deliver any hashes, so a better solution would be, to create a "manifest" and store lecture counts and maybe hashes to that file, and check if any chapter has been altered, then redownload the files and check if they match.
But even that would have to re-download all lectures to make sure you dont have old ones. but it would be easier to check for dupes right away.
i have some ideas in my mind i will check and push the updates soon.
Does this issue still exist ?
yes
in coursera-dl they have a "--resume" option. Maybe you can reuse the same approach: https://github.com/coursera-dl/coursera-dl#resuming-downloads
in coursera-dl they have a "--resume" option. Maybe you can reuse the same approach: https://github.com/coursera-dl/coursera-dl#resuming-downloads
resume capability is already there.
issue is with when a course chapter or a lecture gets updated in anyway (rename/new video file with same name/chapter addition).
i have managed to tackle issue with new chapter/lecture addition. but for existing one the (rename/new video with same name) i 'm checking if we can implement it.
will check udemy api as well if they provide some sort of dates where it says this thing is got updated etc..
@all i will re-open the issue when i plan to work on it with some tricks currently i don't see anything that udemy api provides which can be use to keep track of videos/chapters/course updated or not. I 'm closing the issue as future enhancement and will see if i can implement some sort of fix on top of this issue in the mean while PR and suggestions are also welcome.
So I have already downloaded a course last month and say for example the course got updated yesterday and I wanted to download the updates. When I did try to download it, it was correctly identifying which lessons are already downloaded but when it came to the newly added lesson, it downloaded it but instead of just downloading that new lesson, it downloaded all other lessons after it.
I think this happens because the comparison of the already downloaded lessons and new lessons are just via filename and since there's a number prefix on the downloaded lessons, it will bump up the number starting from the new lesson added 'til end.
Not sure what's the best approach on how this can be avoided/fixed. Probably when doing filename/title comparison remove the prefix?
Thanks.