rojter-tech / pluradl.py

Automated download of Pluralsight courses
MIT License
2 stars 0 forks source link

Issue while downloading the content of plural sight #19

Closed ashwani-dumca closed 4 years ago

ashwani-dumca commented 4 years ago

[debug] Proxy map: {} [pluralsight:course] administering-elasticsearch-cluster: Downloading JSON metadata [download] Downloading playlist: Administering an Elasticsearch Cluster [pluralsight:course] playlist Administering an Elasticsearch Cluster: Collected 22 video ids (downloading 22 of them) [download] Downloading video 1 of 22 [pluralsight] Downloading login page ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on https://github.com/rojter-tech/pluradl.py/issues . Make sure you are using the latest version; see https://github.com/rojter-tech/pluradl.py/wiki on how to update. Be sure to call plura-dl with the --verbose flag and include its complete output. File "/home/delhivery/Videos/Performance/pluradl.py/plura_dl/extractor/common.py", line 627, in _request_webpage return self._downloader.urlopen(url_or_request) File "/home/delhivery/Videos/Performance/pluradl.py/plura_dl/PluraDL.py", line 2238, in urlopen return self._opener.open(req, timeout=self._socket_timeout) File "/usr/lib/python3.6/urllib/request.py", line 532, in open response = meth(req, response) File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.6/urllib/request.py", line 570, in error return self._call_chain(args) File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain result = func(args) File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default

geffchang commented 4 years ago

Did you open the PluralSight site? Has your account been blocked? The default settings may have blocked your account.

ashwani-dumca commented 4 years ago

Yes, they blocked my account, Can you please help me to let me know how to download courses without getting blocked.

On Sat, Apr 4, 2020 at 8:26 PM Geff Chang notifications@github.com wrote:

Did you open the PluralSight site? Has your account been blocked? The default settings may have blocked your account.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rojter-tech/pluradl.py/issues/19#issuecomment-609040807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBKIV44ELDE3TU64KD23QDRK5DB7ANCNFSM4L6LFGLA .

JuanIgnacioBarrranco commented 4 years ago

Hi, it's me again, try downloading courses with this explanation: https://github.com/rojter-tech/pluradl.py

geffchang commented 4 years ago

@JuanIgnacioBarrranco What do you suggest? There is still a comment about getting blocked for scraping: https://github.com/ytdl-org/youtube-dl/issues/24008#issuecomment-609154082 It looks like there is no solution to this.

rojter-tech commented 4 years ago

@9250 If I get more reports like this I would consider to modify the download related request parameters:

SLEEP_INTERVAL = 40 # minimum sleep time between video downloads
SLEEP_OFFSET = 120 # random sleep time up to 120 seconds between video downloads

defined in pluradl.py. However, because this is purely open source no one is stopping you from changing those parameters and try it out by yourself.

ashwani-dumca commented 4 years ago

But I have already tried to pass these values as well, and able to download few files ( ~9) files I was able to download using this)

youtube-dl --sleep-interval 120 " https://app.pluralsight.com/library/courses/administering-elasticsearch-cluster" --cookies ~/Videos/Performance/cookies.txt

Out of 22 videos, I have downloaded 9 files but after this found some suspicious entry and blocked my account, and the download starts failing.

You want me to try both the params?

On Sun, Apr 5, 2020 at 8:25 AM Daniel Reuter notifications@github.com wrote:

@9250 https://github.com/9250 If I get more reports like this I would consider to modify the download related request parameters:

SLEEP_INTERVAL = 40 # minimum sleep time between video downloadsSLEEP_OFFSET = 120 # random sleep time up to 120 seconds between video downloads

defined in pluradl.py https://github.com/rojter-tech/pluradl.py/blob/master/pluradl.py. However, because this is purely open source no one is stopping you from changing those parameters and try it out by yourself.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rojter-tech/pluradl.py/issues/19#issuecomment-609182067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBKIV3XHNDULMHQW5JSL2DRK7XJPANCNFSM4L6LFGLA .

geffchang commented 4 years ago

@9250 Did you get unblocked? Did you also open the PluralSight site while the script was downloading? I think there may be some checks on their side.

ashwani-dumca commented 4 years ago

I created another account and now increased the time of sleep to around 5 minutes. As of now it is downloading good with youtube-dl but with plurald it is not working at all. Not sure what check they have introduced for those call.

I could tell you the course name which has problem, try to download just for checking purpose with any dummy account.

Course list I have in courselist.txt python-big-picture aws-automating-cloudformation apache-spark-fundamentals microservices-architecture microservices-fundamentals docker-getting-started rabbitmq-by-example mongodb-big-data-reporting mongodb-administration mongodb-introduction centralized-logging-elastic-stack elasticsearch-for-dotnet-developers administering-elasticsearch-cluster elasticsearch-indexing-data elasticsearch-designing-schema elasticsearch-analyzing-data

On Sun, Apr 5, 2020 at 10:17 AM Geff Chang notifications@github.com wrote:

@9250 https://github.com/9250 Did you get unblocked? Did you also open the PluralSight site while the script was downloading? I think there may be some checks on their side.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rojter-tech/pluradl.py/issues/19#issuecomment-609349465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBKIVYAP3OPVBWIQ5PGEX3RLAEMXANCNFSM4L6LFGLA .

geffchang commented 4 years ago

I created another account and now increased the time of sleep to around 5 minutes. As of now it is downloading good with youtube-dl but with plurald it is not working at all. Not sure what check they have introduced for those call. I could tell you the course name which has problem, try to download just for checking purpose with any dummy account. Course list I have in courselist.txt python-big-picture aws-automating-cloudformation apache-spark-fundamentals microservices-architecture microservices-fundamentals docker-getting-started rabbitmq-by-example mongodb-big-data-reporting mongodb-administration mongodb-introduction centralized-logging-elastic-stack elasticsearch-for-dotnet-developers administering-elasticsearch-cluster elasticsearch-indexing-data elasticsearch-designing-schema elasticsearch-analyzing-data On Sun, Apr 5, 2020 at 10:17 AM Geff Chang @.***> wrote: @9250 https://github.com/9250 Did you get unblocked? Did you also open the PluralSight site while the script was downloading? I think there may be some checks on their side. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBKIVYAP3OPVBWIQ5PGEX3RLAEMXANCNFSM4L6LFGLA .

Please let us know if it fails.

rojter-tech commented 4 years ago

But I have already tried to pass these values as well, and able to download few files ( ~9) files I was able to download using this) youtube-dl --sleep-interval 120 " https://app.pluralsight.com/library/courses/administering-elasticsearch-cluster" --cookies ~/Videos/Performance/cookies.txt Out of 22 videos, I have downloaded 9 files but after this found some suspicious entry and blocked my account, and the download starts failing. You want me to try both the params? On Sun, Apr 5, 2020 at 8:25 AM Daniel Reuter @.***> wrote: @9250 https://github.com/9250 If I get more reports like this I would consider to modify the download related request parameters: SLEEP_INTERVAL = 40 # minimum sleep time between video downloadsSLEEP_OFFSET = 120 # random sleep time up to 120 seconds between video downloads defined in pluradl.py https://github.com/rojter-tech/pluradl.py/blob/master/pluradl.py. However, because this is purely open source no one is stopping you from changing those parameters and try it out by yourself. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBKIV3XHNDULMHQW5JSL2DRK7XJPANCNFSM4L6LFGLA .

Reports regarding any settings that is proved to work or fail on any system for any time interval would be great. Together with relevant system information, versions, commit sha1 etc. etc. is a great way of communicating in which context the failing or successing procedure was executed.

I can tell from the output you are using some Linux distro, which could be problematic if it happens to be a Ubuntu distro. In this scenario it were already reported that authentication between client and server has a high probability to fail. Furthermore, in this specific scenario another problem (fixed by now by forcing a 60 seconds sleep) was occuring that triggered login requests between course downloads too fast wich ultimately lead to a temporarly or permanent ban.

This might not be your case, it is hard to tell based on the information provided.

Jordhan-Carvalho commented 4 years ago

I got banned on my first account, made another one but still getting 403 forbidden, isnt IP ban because i can use the second account on their website, any ideia why?

rojter-tech commented 4 years ago

I got banned on my first account, made another one but still getting 403 forbidden, isnt IP ban because i can use the second account on their website, any ideia why?

@Jordhan-Carvalho. In general, if the attack has a DDOS like or spamming behaviour IP ban makes sense. If the attack is "just some dude misbehaves by breaking the terms of use"-like behaviour an account ban makes sense. Both of them leading to 403. This kind of security respondance is not specific for Pluralsight.

We have blocked your account because our security systems have flagged your Pluralsight account for an unusual amount activity. This does mean a high volume of requests that are in the realm of a request every 10-30 seconds for a prolonged period of time. Please note that this high volume of activity is in violation of our terms of service [https://www.pluralsight.com/terms].

403
Your account has been blocked due to suspicious activity. Please contact support@pluralsight.com if you believe this was in error.

Jordhan-Carvalho commented 4 years ago

I mean, it should be working with my second account, since i can log and watch the videos as usual on their website, but I'm getting the same forbidden error of my banned account... maybe it has something to do with the cookies?

rojter-tech commented 4 years ago

I mean, it should be working with my second account, since i can log and watch the videos as usual on their website, but I'm getting the same forbidden error of my banned account... maybe it has something to do with the cookies?

Ah, so this issue is essentially the same as #12. Closing this.

dcCMPY commented 4 years ago

Hi is there a way to 'pause' the download?

ehsansajjad-synergy commented 4 years ago

@rojter-tech is this fixed, or specific setting to do locally for this ?