mvabdi / vsco-scraper

Easily allows for scraping a VSCO
MIT License
133 stars 25 forks source link

Multiple profile images: avoid rate limit for more than 40 users? #35

Open lazytownfan opened 2 years ago

lazytownfan commented 2 years ago

After issues #31 and #32 were fixed, I noticed there is rate limiting if I use $ vsco-scraper -mp vsco-list.txt for downloading multiple profile images with a text file when the list of usernames is longer than 40 (but shorter than 100). (Side note: the command flags have to strictly be placed before the text file name in version 0.7.0.)

After the 40th user has been checked, every single username starting with the 41st will crash.

However, I am able to run $ vsco-scraper -m vsco-list.txt to download multiple gallery images from the same list with more than 40 usernames with no rate limiting issues.

I don't know where I saw or read this, but I think there was some measure to avoid rate limiting implemented when downloading multiple galleries/journals/collections of users IIRC - which is really helpful when you use a text file. Can the same be implemented for downloading multiple profile images?

intothevoid33 commented 2 years ago

I have noticed limiting with the updated script as well.

I've never used the text list, however. I have a bash script that goes one-by-one through the list. I added a 5-second delay between each profile and haven't had any issues since.

If there's no ability to rate-limit between profiles with the current script and text-file, that would be a great addition.

lazytownfan commented 2 years ago

Before version 0.7.0, the main component of my Bash alias contained:

$ vsco-scraper -mp vsco-list.txt && vsco-scraper -m vsco-list.txt

before the rate limiting of multiple profile images came up and caused the latter half from executing.

For me (even with a VPN), running the latter half $ vsco-scraper -m vsco-list.txt for VSCO main galleries seems to be free from rate limiting.

@intothevoid33 Do you have any advice on my loop? I tried the following:

$ while read line; do vsco-scraper -p $line && echo -e && sleep 5; done < vsco-list.txt

but after roughly the 40th user, I still get crashes.

However, this std out error messages repeats for every user after the 41st user:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.local/bin/vsco-scraper", line 8, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.10/site-packages/vscoscrape/vscoscrape.py", line 776, in main
    scraper = Scraper(args.username)
  File "/home/user/.local/lib/python3.10/site-packages/vscoscrape/vscoscrape.py", line 30, in __init__
    self.newSiteId()
  File "/home/user/.local/lib/python3.10/site-packages/vscoscrape/vscoscrape.py", line 60, in newSiteId
    self.siteid = res.json()["sites"][0]["id"]
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 917, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: [Errno Expecting value] <html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
</body>
</html>
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
: 0

In my previous testing, using the text file would only show me that username-here crashed after the 40th user with $ vsco-scraper -mp vsco-list.txt. However, even after this command crashes from rate limiting, I am able to manually enter $ vsco-scraper -p remaining-name-here for remaining users (almost 40 more) after the 40th one with seemingly no rate limiting.

swaggyP36000 commented 6 months ago

Just in case anyone is still stuck on this, here's a fairly simple .sh script I made with the help of GPT:

# Split usernames into parts. Change users.txt to whatever your text file name is. Works on a per-line basis, won't split a single username in half
split -l 5 users.txt users_part

# Run scrape command. You can choose to use other arguments instead of -ap (ie. --all )
for file in users_part*; do
    # Process each part file with vsco-scraper
    vsco-scraper "$file" -ap

    # Pause for 30 seconds
    sleep 30s
done

# Delete the part files
rm users_part*

If anyone wants to test this running with less sleep time in the for loop, be my guest. If you still get multiple "crashed" outputs, use a VPN first, and then run. Your IP may just be temporarily rate limited

All testing done via Linux terminal. How to run .sh files on Windows