ikeboy / pluralsight-scraper

Pluralsight video downloader
https://www.knyz.org/blog/post/pluralsight-scraper-released/
GNU General Public License v2.0
136 stars 49 forks source link

Directory not found (path too long; Windows) #14

Open Suisse00 opened 4 years ago

Suisse00 commented 4 years ago

Titles could be huge and Windows still limit the path to 260 characters (there is an option in Windows 10 to increase it but you have to opt-in... by editing the register...)

I recommend the CWD to be <= 50 characters.

As for future modification/parameters:

vezaynk commented 4 years ago

I recommend the CWD to be <= 50 characters.

Are there any courses that actually exceed this limit?

Using the course url (bonus it is human friendly)

It's already using the course url to download. Or am I misunderstanding?

In extreme cases would be to export the GUID + a mapping file(s) with the tile so the user could rename it as it wants

What is the purpose of this?

Suisse00 commented 4 years ago

Are there any courses that actually exceed this limit? Are there any courses that actually exceed this limit? Assuming you download everything on the root of your drive I think you should be fine. But when you start running this script somewhere else... it could get you in trouble fast.

For example, I have one that looks like Building an Enterprise Grade Distributed Online Analytics Platform\4 Introducing Distributed Computation with Apache Storm\6 Demo - Downloading, Configuring, and Running Apache Storm.mp4 This is 187 characters. (And this isn't the longest I got).

Using the course url (bonus it is human friendly)

It's already using the course url to download. Or am I misunderstanding?

I meant building-enterprise-distributed-online-analytics-platform in https://app.pluralsight.com/library/courses/building-enterprise-distributed-online-analytics-platform

In extreme cases would be to export the GUID + a mapping file(s) with the tile so the user could rename it as it wants

What is the purpose of this?

Just move the issue of the script crashing on Windows with a non-explicit error to the hand of the user. We allow the user to download videos even if the file system could prevent it in the first place.

Then if the user want to rename them (because they will be named as GUID), the "mapping file" will allow them to know the original title and the Windows Explorer will prevent the user to have a >= 260 path characters.

vezaynk commented 4 years ago

What a ridiculous limitation :/

How does wget do it?

Suisse00 commented 4 years ago

What a ridiculous limitation :/

Yep... Welcome to 2020 Windows!

How does wget do it?

I didn't use wget since a while but it is likely to be the same way as everyone else; Use the Content-Disposition HTTP header response or the "filename" part from the URL request. (The value next to the last / (slash) but before GET/# anchor).

If you use one of the cdn file from the /viewclip the "filename" extracted from the URL would be like 1280x720.mp4 If you would download one of the HLS file stream the browser ask for it should download something like blablabla/hls_1280x720.ts?token so that mean wget would probably download it as hls_1280x720.ts In the last case since it is always the same filename regardless of the course and part... either wget would override it (and you won't get the full video) or you would end with a lot of hls_1280x720 (X).ts

vezaynk commented 4 years ago

I meant to ask how it deals with filenames longer than 50 characters

Suisse00 commented 4 years ago

I was talking about the "Current Working Directory" (CWD) that should be less than 50 characters just as a kind of warning that you may look for trouble otherwise.

Like if you try to download/run your script from /mount/whatever/long base path for my collection of courses/ you already use 61 characters out of the 250. Then the script you are creating will create this sub-directory:Building an Enterprise Grade Distributed Online Analytics Platform/4 Introducing Distributed Computation with Apache Storm/ (+123 for a total of 184). You will wget from /mount/whatever/long base path for my collection of courses/Building an Enterprise Grade Distributed Online Analytics Platform/4 Introducing Distributed Computation with Apache Storm/" with an url likehttp://something/6 Demo - Downloading, Configuring, and Running Apache Storm.mp4`. (the filename is +64 characters for a total of 247)

The actual path on drive is: /mount/whatever/long base path for my collection of courses/Building an Enterprise Grade Distributed Online Analytics Platform/4 Introducing Distributed Computation with Apache Storm/6 Demo - Downloading, Configuring, and Running Apache Storm.mp4 and you have 3 characters in this case before Windows trow an error to try to create a file.

How wget deal with that... oh by not working and returning the error I saw (DirectoryNotFoundException in C#): 6 Demo - Downloading, Configuring, and Running Apache Storm.mp4: No such file or directory

EDIT: Reformulating everything because I suck

vezaynk commented 4 years ago

Makes sense. I’m somewhat leaning towards keeping it as is and letting it fail. I don’t know how I feel about babysitting bad configuration options.

But if we do fix it, I think we could just truncate the folders and filenames such that it doenst go over the limit. I only need 6 characters for each filename. Rest can be chopped off as needed.

I really want to keep it simple, I don’t want to see half the codebase to be workarounds for the operating system.

Fares92 commented 4 years ago

hi i run this npm run get -- "https://app.pluralsight.com/library/courses/docker-getting-started" but i got an error Something went wrong. Double check the URL and try logging in again. the login command works

Suisse00 commented 4 years ago

Whence the simple >=50 characters kind of validation. A kind of 2 line codes at the beginning.

Thought could also try to add a catch(DirectoryNotFound) for Windows as well to write a warning.

BTW Does Linux trim the directory name? I got one case where the title of the course end with a space. When doing the Directory.Create (c#) Windows trimmed the directory name so the File.Create failed because it didn't match,

Suisse00 commented 4 years ago

^

Don't Google this tool, look suspicious. When you google you can find the exact same post (and it is always from a new account). You can also find threads where their bot is spamming themself (quite funny to see)

User reported, let see where it go