HTTP Error 403 and Youtube Error 403 (exceeded quota). Program halts without downloading.

CemraJC commented 6 years ago

[x] Using latest version as provided on the master branch
[x] Searched for similar issues including closed ones

What is the purpose of your issue?

[x] Bug
[ ] Feature Request
[ ] Question
[ ] Other

Description

When attempting to download music from a previously-parsed spotify playlist, the program exits with an error about "403: The request cannot be completed because you have exceeded your quota".

If you need any further information, please let me know!

Log

$ python ./spotdl.py -l ./playlist.txt -f /d/Music\ 02/ --log-level DEBUG

DEBUG: Python version: 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]
DEBUG: Platform: Windows-10-10.0.14393-SP0
DEBUG: {'album': None,
     'avconv': False,
     'config': None,
     'download_only_metadata': False,
     'dry_run': False,
     'file_format': '{artist} - {track_name}',
     'folder': 'D:/Music 02/',
     'input_ext': '.m4a',
     'list': './playlist.txt',
     'log_level': 10,
     'manual': False,
     'music_videos_only': False,
     'no_metadata': False,
     'no_spaces': False,
     'output_ext': '.mp3',
     'overwrite': 'prompt',
     'playlist': None,
     'song': None,
     'username': None}
INFO: Preparing to download 3 songs

DEBUG: Fetching metadata for given track URL
DEBUG: Fetching lyrics
<<(Truncated Meta Output)>>
DEBUG: query: {'part': 'snippet', 'maxResults': 50, 'type': 'video', 'q': 'Arctic Monkeys - R U Mine?'}
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\pafy\util.py", line 34, in call_gdata
    data = g.opener.open(url).read().decode('utf-8')
  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 532, in open
    response = meth(req, response)
  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 570, in error
    return self._call_chain(*args)
  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./spotdl.py", line 185, in <module>
    download_list(text_file=const.args.list)
  File "./spotdl.py", line 79, in download_list
    download_single(raw_song, number=number)
  File "./spotdl.py", line 116, in download_single
    content = youtube_tools.go_pafy(raw_song, meta_tags)
  File "C:\Users\Jason Cemra\Downloads\spotify-downloader-master\core\youtube_tools.py", line 22, in go_pafy
    track_url = generate_youtube_url(raw_song, meta_tags)
  File "C:\Users\Jason Cemra\Downloads\spotify-downloader-master\core\youtube_tools.py", line 84, in generate_youtube_url
    data = pafy.call_gdata('search', query)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pafy\util.py", line 42, in call_gdata
    raise GdataError(errmsg)
pafy.util.GdataError: Youtube Error 403: The request cannot be completed because you have exceeded your <a href="/youtube/v3/getting-started#quota">quota</a>.

aedm commented 6 years ago

I get the same error with the Docker image.

ritiek commented 6 years ago

A temporary fix would be to generate your own YouTube API key (from https://developers.google.com/youtube/registering_an_application) and replace the one in here: https://github.com/ritiek/spotify-downloader/blob/f943080edb7706cdfbc1478954c396dd803a0a35/core/youtube_tools.py#L10-L11

@linusg, @vn-ki (and anyone reading this) what do you guys think? There are a couple of things we could do:

Use 2 API keys - when one expires, use the second one to make API calls.
Let the user create their own API key before they can use this tool (not my preference).
Just scrape YouTube like we used to.

vn-ki commented 6 years ago

@ritiek I never thought we would exceed the quota limit! mpsyt does not have any issues regarding quota. (Maybe it's because mpsyt uses an old api key which has 50 times more quota points).

Quoting stackoverflow

You can use 30,000 units/second/user and 1,000,000 per day.

Our most quota intensive operation is search (100 quota points per operation) which means each user can download a maximum of 300 songs and every user combined can download a maximum of 10,000 songs.

Now, my take on the proposed solutions,

I am against using 2 API keys. What if we have to scale even further in future?
I think asking the user to set an API key in the event of an quota exceed is a nice idea.
Going back to scraping makes more sense, as the number of users will only increase. But it comes with it's own problems. Instability and slowness being the most visible ones. But since this is a well maintained project, I think we can iron out the bugs as soon as they are out!

What do you think?

linusg commented 6 years ago

@ritiek First of all, let's narrow down the source of the problem. I believe it's because of the rising popularity of spotify-downloader we now hit the quota limit when too many people actually have used the tool on a day. Right?

If so, option 1 seems to be a fairly bad workaround: how long does it take until both keys exceed their limit on a day? Some weeks, some months? You see where I'm going here, I believe maintaining a list of API keys is not exactly what we want. Option 2 is in my opinion the most reliable one, as users will have their own quotas. Option 3 seems the most practical to me.

I would go for 3 and making 2 optional. Which would require a good level of abstraction but I believe we can do it, and it would still bring speed and reliability improvements to users having their own key.

linusg commented 6 years ago

So I basically agree with @vn-ki here 👍 Any chance we can monitor the quota point level? It slowly decreasing would indicate a quite heavy but regular usage. It going down very quickly at one point would indicate abuse/high overuse by an individual.

vn-ki commented 6 years ago

Any chance we can monitor the quota point level?

@linusg, I believe @ritiek can, in his project console.

ritiek commented 6 years ago

Any chance we can monitor the quota point level?

I don't think I remember what account I used to generate API keys anymore but even if we could monitor, we couldn't have done much. Since, as @vn-ki mentioned:

Our most quota intensive operation is search (100 quota points per operation) which means each user can download a maximum of 300 songs and every user combined can download a maximum of 10,000 songs.

300 songs for user per day is not really an abuse/overuse.

So far, scraping YouTube by default and additionally have an option for users to set an API key for better stability/faster response time seems a good idea to me.

ritiek commented 6 years ago

As of #250, the tool will scrape YouTube by default (uses API quota only when fetching video details with pafy.new()) and users may switch over to YouTube API completely (for making searches as well) after setting their API key in config.yml.

CemraJC commented 6 years ago

Thanks everyone for your replies! I have to say, this project is run much better than many others I know of - I hardly expected a response, let alone a full conversation, a fix and a new release not more than 3 days after I posted this issue!

@ritiek I'll go generate my own API key and test it all out when I get back home on Monday. Either way, I would say that the issue has been appropriately addressed and the solution should scale as intended.

Thanks again! You guys are awesome 😄🎉

ritiek commented 6 years ago

@CemraJC Yep thanks, let us know how it goes. We'll close this issue then.

CemraJC commented 6 years ago

@ritiek Just ran a test then with a 50 song playlist, after generating my own API key. Speeds are good, no worries with the API quota!

Thanks again!

linusg commented 6 years ago

@CemraJC thanks for letting us know! Glad it works smoothly. @ritiek really good work, and nice to finally see you around here again :smiley:

SaeedSheikhi commented 5 years ago

I know the issue is closed :D but it is the most relevant thread I think

what is the default behavior of script? web scraping or API calling?

I have a doubt on script method, currently I wrote an extra node layer on top of spotdl CLI, using queue system and streaming files, but even after 5 attempts with 15-30 sec backoff/delay script will return 403 error, i was wondering script is using web scraping, but it seems we are using youtube API and it exceeded quota.

ritiek commented 5 years ago

@SaeedSheikhi I'll explain a bit. We default to web-scraping for making video searches on YouTube (what this issue was originally about). Once the best matching video is selected, we use pafy to get the audiostream off this video. It is pafy which makes use of the YouTube API and relies on youtube-dl to fetch the audiostream.

And as I tried a few days ago, the problem doesn't seem to be related to exceeded quota. It seems the problem is with YouTube itself which somehow generates audiostream URL which gives us a 403 forbidden when trying to access it. By audiostream URL, I mean the big URL which looks something like:

 https://r1---sn-8pgbpohxqp5-ac5l.googlevideo.com/videoplayback?expire=1557779330&ei=IX_ZXIjqNcfjVMiIgIgI&i...

Also see #550 for more info.

SaeedSheikhi commented 5 years ago

@ritiek Thanks for your explanation, i did some investigation and walking around issues, read both #550 & #246, finally I end up here to post a comment. for a temporary solution, fixed that by implementing a queue layer with 30 sec backoff and 15 times attempts, luckily most of the time script can reach the media on 12-14th attempt :D but at the end it required a hard work to make a CLI calls and attempts automatic for most of the users.

spotDL / spotify-downloader

HTTP Error 403 and Youtube Error 403 (exceeded quota). Program halts without downloading. #246

What is the purpose of your issue?

Description

Log