ikeboy / pluralsight-scraper

Pluralsight video downloader
https://www.knyz.org/blog/post/pluralsight-scraper-released/
GNU General Public License v2.0
140 stars 49 forks source link

Subtitles .srt to be downloaded as well #33

Open prostopasta opened 4 years ago

prostopasta commented 4 years ago

Where/what should I add to your code for subtitles .srt to be downloaded as well (ENG/RUS)?

vezaynk commented 4 years ago

Captions can be retrieved from the following route:

https://app.pluralsight.com/transcript/api/v1/caption/webvtt/<clipId>/<versionId>/<language>/

clipId: It is already used within the script, nothing new

versionId: The versionId can be retrieved from the /viewClip endpoint, but the script will need some refactoring. Instead of getVideoUrl we will need getVideoData or something like that, as we absolutely cannot hit that end-point more than necessary.

The response contains version in the root of the object (second-to-last property).

language: The language comes from a set of hard-coded values in the embedded player itself. But the values are here:

[{"name":"Afrikaans","code":"af"},{"name":"Albanian","code":"sq"},{"name":"Amharic","code":"am"},{"name":"Arabic","code":"ar"},{"name":"Armenian","code":"hy"},{"name":"Azeerbaijani","code":"az"},{"name":"Basque","code":"eu"},{"name":"Belarusian","code":"be"},{"name":"Bengali","code":"bn"},{"name":"Bosnian","code":"bs"},{"name":"Bulgarian","code":"bg"},{"name":"Catalan","code":"ca"},{"name":"Cebuano","code":"ceb"},{"name":"Chinese (Simplified)","code":"zh-CN"},{"name":"Chinese (Traditional)","code":"zh-TW"},{"name":"Corsican","code":"co"},{"name":"Croatian","code":"hr"},{"name":"Czech","code":"cs"},{"name":"Danish","code":"da"},{"name":"Dutch","code":"nl"},{"name":"English","code":"en"},{"name":"Esperanto","code":"eo"},{"name":"Estonian","code":"et"},{"name":"Finnish","code":"fi"},{"name":"French","code":"fr"},{"name":"Frisian","code":"fy"},{"name":"Galician","code":"gl"},{"name":"Georgian","code":"ka"},{"name":"German","code":"de"},{"name":"Greek","code":"el"},{"name":"Gujarati","code":"gu"},{"name":"Haitian Creole","code":"ht"},{"name":"Hausa","code":"ha"},{"name":"Hawaiian","code":"haw"},{"name":"Hebrew","code":"iw"},{"name":"Hindi","code":"hi"},{"name":"Hmong","code":"hmn"},{"name":"Hungarian","code":"hu"},{"name":"Icelandic","code":"is"},{"name":"Igbo","code":"ig"},{"name":"Indonesian","code":"id"},{"name":"Irish","code":"ga"},{"name":"Italian","code":"it"},{"name":"Japanese","code":"ja"},{"name":"Javanese","code":"jw"},{"name":"Kannada","code":"kn"},{"name":"Kazakh","code":"kk"},{"name":"Khmer","code":"km"},{"name":"Korean","code":"ko"},{"name":"Kurdish","code":"ku"},{"name":"Kyrgyz","code":"ky"},{"name":"Lao","code":"lo"},{"name":"Latin","code":"la"},{"name":"Latvian","code":"lv"},{"name":"Lithuanian","code":"lt"},{"name":"Luxembourgish","code":"lb"},{"name":"Macedonian","code":"mk"},{"name":"Malagasy","code":"mg"},{"name":"Malay","code":"ms"},{"name":"Malayalam","code":"ml"},{"name":"Maltese","code":"mt"},{"name":"Maori","code":"mi"},{"name":"Marathi","code":"mr"},{"name":"Mongolian","code":"mn"},{"name":"Myanmar (Burmese)","code":"my"},{"name":"Nepali","code":"ne"},{"name":"Norwegian","code":"no"},{"name":"Nyanja (Chichewa)","code":"ny"},{"name":"Pashto","code":"ps"},{"name":"Persian","code":"fa"},{"name":"Polish","code":"pl"},{"name":"Portuguese (Portugal, Brazil)","code":"pt"},{"name":"Punjabi","code":"pa"},{"name":"Romanian","code":"ro"},{"name":"Russian","code":"ru"},{"name":"Samoan","code":"sm"},{"name":"Scots Gaelic","code":"gd"},{"name":"Serbian","code":"sr"},{"name":"Sesotho","code":"st"},{"name":"Shona","code":"sn"},{"name":"Sindhi","code":"sd"},{"name":"Sinhala (Sinhalese)","code":"si"},{"name":"Slovak","code":"sk"},{"name":"Slovenian","code":"sl"},{"name":"Somali","code":"so"},{"name":"Spanish","code":"es"},{"name":"Sundanese","code":"su"},{"name":"Swahili","code":"sw"},{"name":"Swedish","code":"sv"},{"name":"Tagalog (Filipino)","code":"tl"},{"name":"Tajik","code":"tg"},{"name":"Tamil","code":"ta"},{"name":"Telugu","code":"te"},{"name":"Thai","code":"th"},{"name":"Turkish","code":"tr"},{"name":"Ukrainian","code":"uk"},{"name":"Urdu","code":"ur"},{"name":"Uzbek","code":"uz"},{"name":"Vietnamese","code":"vi"},{"name":"Welsh","code":"cy"},{"name":"Xhosa","code":"xh"},{"name":"Yiddish","code":"yi"},{"name":"Yoruba","code":"yo"},{"name":"Zulu","code":"zu"},{"name":"Chinese (Simplified)","code":"zh"},{"name":"Hebrew","code":"he"}]

Example when everything is put together: https://app.pluralsight.com/transcript/api/v1/caption/webvtt/b8648aa4-41aa-432f-a37e-b51b8ae361fc/345312ba-591d-4216-8846-477fb7d51459/en

The format is not SRT, but WebVTT.

Summary of tasks:

Bonus tasks:

prostopasta commented 4 years ago

Ok, I will try, and thank you for very detailed task steps