Open Pr0j3ct opened 4 months ago
Same issue here #48. I'm using https://github.com/mikf/gallery-dl which is working fine
One thing I noticed was that the sub-domain returns 403: i.vsco.co
but using url like so: vsco.co/i
returns the image without problem.
I'm no programmer but when I have some free time I may try and refactor atleast one of the modules to support that change and see what happens.
@Pr0j3ct what do you mean?
I put a print statement into the script to see what it was trying to download. What printed out matched what I got when manually going to the gallery page, selecting and image and then inspecting it.
The API has definitely changed.
Digging through the gallery-dl project I can see that they’re using a different API call
It’s essentially /api/3.0/ Whereas the current version of this project uses /api/2.0/
On Thu, Jul 25, 2024 at 10:21 AM Project @.***> wrote:
One thing I noticed was that the sub-domain returns 403: i.vsco.co
but using url like so: vsco.co/i
returns the image without problem.
I'm no programmer but when I have some free time I may try and refactor atleast one of the modules to support that change and see what happens.
— Reply to this email directly, view it on GitHub https://github.com/mvabdi/vsco-scraper/issues/49#issuecomment-2250659790, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXYLG6DADYKKHKEGHF52CCTZOEJXRAVCNFSM6AAAAABLIMKXR6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQGY2TSNZZGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Edit: Seems like they block the default request header which is used by the script.
You could simply set a custom header to your requests to get the images.
create a new entry in constants.py
images = {
'User-Agent': random.choice(user_agents),
'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5',
'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
'Connection': 'keep-alive',
'Referer': 'https://vsco.co/',
'Sec-Fetch-Dest': 'image',
'Sec-Fetch-Mode': 'no-cors',
'Sec-Fetch-Site': 'same-site',
'Priority': 'u=4, i',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
}
use them in vscoscrape.py
def download_img_normal(self, lists):
if lists[2] is False:
if f"{lists[1]}.jpg" in os.listdir():
return True
with open(f"{str(lists[1])}.jpg", "wb") as file:
file.write(requests.get(lists[0], headers=constants.images, stream=True).content)
else:
if f"{lists[1]}.mp4" in os.listdir():
return True
with open(f"{str(lists[1])}.mp4", "wb") as file:
for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content(
chunk_size=1024
):
if chunk:
file.write(chunk)
return True
Alternatively you could use cloudscraper instead of the python requests.
pip install cloudscraper
import cloudscraper
class Scraper(object):
def __init__(self, cache, latestCache):
self.cache = cache
self.latestCache = latestCache
self.scraper = cloudscraper.create_scraper()
def download_img_journal(self, lists):
"""
Downloads the journal media in specified ways depending on the type of media
Since Journal items can be text files, images, or videos, I had to make 3
different ways of downloading
:params: lists - No idea why I named it this, but it's a media item
:return: a boolean on whether the journal media was able to be downloaded
"""
if lists[1] == "txt":
with open(f"{str(lists[0])}.txt", "w") as file:
file.write(lists[0])
if lists[2] == "img":
if f"{lists[1]}.jpg" in os.listdir():
return True
with open(f"{str(lists[1])}.jpg", "wb") as file:
file.write(self.scraper.get(lists[0], stream=True).content)
elif lists[2] == "vid":
if f"{lists[1]}.mp4" in os.listdir():
return True
with open(f"{str(lists[1])}.mp4", "wb") as file:
for chunk in self.scraper.get(lists[0], stream=True).iter_content(
chunk_size=1024
):
if chunk:
file.write(chunk)
self.progbarj.update()
return True
def download_img_normal(self, lists):
"""
This function makes sense at least
The if '%s.whatever' sections are to skip downloading the file again if it's already been downloaded
At the time I wrote this, I only remember seeing that images and videos were the only things allowed
So I didn't write an if statement checking for text files, so this would just skip it I believe if it ever came up
and return True
:params: lists - My naming sense was beat. lists is just a media item.
:return: a boolean on whether the media item was downloaded successfully
"""
if lists[2] is False:
if f"{lists[1]}.jpg" in os.listdir():
return True
with open(f"{str(lists[1])}.jpg", "wb") as file:
file.write(self.scraper.get(lists[0], stream=True).content)
else:
if f"{lists[1]}.mp4" in os.listdir():
return True
with open(f"{str(lists[1])}.mp4", "wb") as file:
for chunk in self.scraper.get(lists[0], stream=True).iter_content(
chunk_size=1024
):
if chunk:
file.write(chunk)
return True
Edit: Seems like they block the default request header which is used by the script.
You could simply set a custom header to your requests to get the images.
- create a new entry in constants.py
images = { 'User-Agent': random.choice(user_agents), 'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5', 'Accept-Language': 'de,en-US;q=0.7,en;q=0.3', 'Connection': 'keep-alive', 'Referer': 'https://vsco.co/', 'Sec-Fetch-Dest': 'image', 'Sec-Fetch-Mode': 'no-cors', 'Sec-Fetch-Site': 'same-site', 'Priority': 'u=4, i', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', }
- use them in vscoscrape.py
def download_img_normal(self, lists): if lists[2] is False: if f"{lists[1]}.jpg" in os.listdir(): return True with open(f"{str(lists[1])}.jpg", "wb") as file: file.write(requests.get(lists[0], headers=constants.images, stream=True).content) else: if f"{lists[1]}.mp4" in os.listdir(): return True with open(f"{str(lists[1])}.mp4", "wb") as file: for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content( chunk_size=1024 ): if chunk: file.write(chunk) return True
That works perfectly, thank you!
Co
Edit: Seems like they block the default request header which is used by the script.
Could someone please explain how to do this? Would like to get this working again. I've tried gallery-dl but prefer vscoscraper.
Co
Edit: Seems like they block the default request header which is used by the script.
Could someone please explain how to do this? Would like to get this working again. I've tried gallery-dl but prefer vscoscraper.
I´ve already explained how to do this. Where exactly do you need help?
Co I´ve already explained how to do this. Where exactly do you need help?
I can see where to replace the txt in the constants.py file. But I'm not sure where to add the txt to the vscoscrpae.py file.
I've tried adding at the end but i get an error message when I run the script
Cheers
Co I´ve already explained how to do this. Where exactly do you need help?
I can see where to replace the txt in the constants.py file. But I'm not sure where to add the txt to the vscoscrpae.py file.
I've tried adding at the end but i get an error message when I run the script
Cheers
nothing to replace in constants.py, just add images dict
and add headers=constants.images
like he did in download_img_normal func
Edit: Seems like they block the default request header which is used by the script.
You could simply set a custom header to your requests to get the images.
1. create a new entry in constants.py
images = { 'User-Agent': random.choice(user_agents), 'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5', 'Accept-Language': 'de,en-US;q=0.7,en;q=0.3', 'Connection': 'keep-alive', 'Referer': 'https://vsco.co/', 'Sec-Fetch-Dest': 'image', 'Sec-Fetch-Mode': 'no-cors', 'Sec-Fetch-Site': 'same-site', 'Priority': 'u=4, i', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', }
2. use them in vscoscrape.py
def download_img_normal(self, lists): if lists[2] is False: if f"{lists[1]}.jpg" in os.listdir(): return True with open(f"{str(lists[1])}.jpg", "wb") as file: file.write(requests.get(lists[0], headers=constants.images, stream=True).content) else: if f"{lists[1]}.mp4" in os.listdir(): return True with open(f"{str(lists[1])}.mp4", "wb") as file: for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content( chunk_size=1024 ): if chunk: file.write(chunk) return True
Alternatively you could use cloudscraper instead of the python requests.
pip install cloudscraper
import cloudscraper class Scraper(object): def __init__(self, cache, latestCache): self.cache = cache self.latestCache = latestCache self.scraper = cloudscraper.create_scraper()
def download_img_journal(self, lists): """ Downloads the journal media in specified ways depending on the type of media Since Journal items can be text files, images, or videos, I had to make 3 different ways of downloading :params: lists - No idea why I named it this, but it's a media item :return: a boolean on whether the journal media was able to be downloaded """ if lists[1] == "txt": with open(f"{str(lists[0])}.txt", "w") as file: file.write(lists[0]) if lists[2] == "img": if f"{lists[1]}.jpg" in os.listdir(): return True with open(f"{str(lists[1])}.jpg", "wb") as file: file.write(self.scraper.get(lists[0], stream=True).content) elif lists[2] == "vid": if f"{lists[1]}.mp4" in os.listdir(): return True with open(f"{str(lists[1])}.mp4", "wb") as file: for chunk in self.scraper.get(lists[0], stream=True).iter_content( chunk_size=1024 ): if chunk: file.write(chunk) self.progbarj.update() return True
def download_img_normal(self, lists): """ This function makes sense at least The if '%s.whatever' sections are to skip downloading the file again if it's already been downloaded At the time I wrote this, I only remember seeing that images and videos were the only things allowed So I didn't write an if statement checking for text files, so this would just skip it I believe if it ever came up and return True :params: lists - My naming sense was beat. lists is just a media item. :return: a boolean on whether the media item was downloaded successfully """ if lists[2] is False: if f"{lists[1]}.jpg" in os.listdir(): return True with open(f"{str(lists[1])}.jpg", "wb") as file: file.write(self.scraper.get(lists[0], stream=True).content) else: if f"{lists[1]}.mp4" in os.listdir(): return True with open(f"{str(lists[1])}.mp4", "wb") as file: for chunk in self.scraper.get(lists[0], stream=True).iter_content( chunk_size=1024 ): if chunk: file.write(chunk) return True
Hey, so I am not a programmer in the least, the first two files you are referring to constants.py and vscoscrape.py, where are those located? and where are those new entries supposed to be in the files you mention? Of course any help is sincerely appreciated!
Edit: so when I look through the git for vsco-scraper I see the two files you are talking about, I am not sure what I am supposed to do with those files. I installed vsco-scraper with pip, so in this case do I need to edit the source and perform a build/compile or something along those lines? Forgive me, I only know that the vsco-scraper is in the bin folder off of my linux profile, after that I have zero ideas on what to do... =(
Hey, so I am not a programmer in the least, the first two files you are referring to constants.py and vscoscrape.py, where are those located? and where are those new entries supposed to be in the files you mention? Of course any help is sincerely appreciated!
Edit: so when I look through the git for vsco-scraper I see the two files you are talking about, I am not sure what I am supposed to do with those files. I installed vsco-scraper with pip, so in this case do I need to edit the source and perform a build/compile or something along those lines? Forgive me, I only know that the vsco-scraper is in the bin folder off of my linux profile, after that I have zero ideas on what to do... =(
if you installed vscoscrape with pip the files are located in your python installation. Edit: to locate a pip package you can use the command "pip show vsco-scraper" for example C:\Python310\Lib\site-packages\vscoscrape You find the files there. (constants.py / vscoscrape.py).
No need to build from source. Just use the pip package and do the following. Now open constants.py with your text editor and at the end of the file you paste this:
images = {
'User-Agent': random.choice(user_agents),
'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5',
'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
'Connection': 'keep-alive',
'Referer': 'https://vsco.co/',
'Sec-Fetch-Dest': 'image',
'Sec-Fetch-Mode': 'no-cors',
'Sec-Fetch-Site': 'same-site',
'Priority': 'u=4, i',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
}
Now open vscoscrape.py and search for download_img_normal From there you select the whole function (until "return true") Then you copy my function and replace it:
def download_img_normal(self, lists):
if lists[2] is False:
if f"{lists[1]}.jpg" in os.listdir():
return True
with open(f"{str(lists[1])}.jpg", "wb") as file:
file.write(requests.get(lists[0], headers=constants.images, stream=True).content)
else:
if f"{lists[1]}.mp4" in os.listdir():
return True
with open(f"{str(lists[1])}.mp4", "wb") as file:
for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content(
chunk_size=1024
):
if chunk:
file.write(chunk)
return True
Hey, so I am not a programmer in the least, the first two files you are referring to constants.py and vscoscrape.py, where are those located? and where are those new entries supposed to be in the files you mention? Of course any help is sincerely appreciated! Edit: so when I look through the git for vsco-scraper I see the two files you are talking about, I am not sure what I am supposed to do with those files. I installed vsco-scraper with pip, so in this case do I need to edit the source and perform a build/compile or something along those lines? Forgive me, I only know that the vsco-scraper is in the bin folder off of my linux profile, after that I have zero ideas on what to do... =(
if you installed vscoscrape with pip the files are located in your python installation. Edit: to locate a pip package you can use the command "pip show vsco-scraper" for example C:\Python310\Lib\site-packages\vscoscrape You find the files there. (constants.py / vscoscrape.py).
No need to build from source. Just use the pip package and do the following. Now open constants.py with your text editor and at the end of the file you paste this:
images = { 'User-Agent': random.choice(user_agents), 'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5', 'Accept-Language': 'de,en-US;q=0.7,en;q=0.3', 'Connection': 'keep-alive', 'Referer': 'https://vsco.co/', 'Sec-Fetch-Dest': 'image', 'Sec-Fetch-Mode': 'no-cors', 'Sec-Fetch-Site': 'same-site', 'Priority': 'u=4, i', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', }
Now open vscoscrape.py and search for download_img_normal From there you select the whole function (until "return true") Then you copy my function and replace it:
def download_img_normal(self, lists): if lists[2] is False: if f"{lists[1]}.jpg" in os.listdir(): return True with open(f"{str(lists[1])}.jpg", "wb") as file: file.write(requests.get(lists[0], headers=constants.images, stream=True).content) else: if f"{lists[1]}.mp4" in os.listdir(): return True with open(f"{str(lists[1])}.mp4", "wb") as file: for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content( chunk_size=1024 ): if chunk: file.write(chunk) return True
Thank you very much!! Those changes were easy enough, first attempt gave me an indentation error, I just needed to move the "def download_img_normal(self, lists):" line over a tab space to line up with all the others and it ran without issue! I really appreciate your time! =)
Edit: I tested if for journals, it produces the 118k files, I tried to sort it out, the block for journals is very different...
Edit: I figured it out, I looked for the function for downloading journals, and added "headers=constants.images" to the jpg and mp4 lines and it worked like a charm!
I'm certainly not a python programmer now...lol but reading through your code, I see that constants.images must refer to the constants.py file and the .images must refer to the images entry that you had me add! Thanks for helping me see it! =)
thanks vm @timbo0o1, i know there is gallery-dl but it doesnt keep the same original filename and for updating an old folder it was ass
Hey, so I am not a programmer in the least, the first two files you are referring to constants.py and vscoscrape.py, where are those located? and where are those new entries supposed to be in the files you mention? Of course any help is sincerely appreciated! Edit: so when I look through the git for vsco-scraper I see the two files you are talking about, I am not sure what I am supposed to do with those files. I installed vsco-scraper with pip, so in this case do I need to edit the source and perform a build/compile or something along those lines? Forgive me, I only know that the vsco-scraper is in the bin folder off of my linux profile, after that I have zero ideas on what to do... =(
if you installed vscoscrape with pip the files are located in your python installation. Edit: to locate a pip package you can use the command "pip show vsco-scraper" for example C:\Python310\Lib\site-packages\vscoscrape You find the files there. (constants.py / vscoscrape.py).
No need to build from source. Just use the pip package and do the following. Now open constants.py with your text editor and at the end of the file you paste this:
images = { 'User-Agent': random.choice(user_agents), 'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5', 'Accept-Language': 'de,en-US;q=0.7,en;q=0.3', 'Connection': 'keep-alive', 'Referer': 'https://vsco.co/', 'Sec-Fetch-Dest': 'image', 'Sec-Fetch-Mode': 'no-cors', 'Sec-Fetch-Site': 'same-site', 'Priority': 'u=4, i', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', }
Now open vscoscrape.py and search for download_img_normal From there you select the whole function (until "return true") Then you copy my function and replace it:
def download_img_normal(self, lists): if lists[2] is False: if f"{lists[1]}.jpg" in os.listdir(): return True with open(f"{str(lists[1])}.jpg", "wb") as file: file.write(requests.get(lists[0], headers=constants.images, stream=True).content) else: if f"{lists[1]}.mp4" in os.listdir(): return True with open(f"{str(lists[1])}.mp4", "wb") as file: for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content( chunk_size=1024 ): if chunk: file.write(chunk) return True
hey, thanks for the previous help. unfortunately the script doesn't work again. i tried to run it, but it shows '... crashed' for every usernames in my txt file. please take a look... thank you
hey, thanks for the previous help. unfortunately the script doesn't work again. i tried to run it, but it shows '... crashed' for every usernames in my txt file. please take a look... thank you
Maybe take a look in here #50
Approx 2 weeks ago the scraper only started collecting 118 byte files.
Does not appear to be IP address related. Has the VSCO API changed?