modestlyOP / ArtStationImageDownloader

Simple tool that scrapes ArtStation Likes pages or project pages for image assets
MIT License
0 stars 0 forks source link

Downloading too many files at once will cause a ~50% file corruption #1

Open modestlyOP opened 10 months ago

modestlyOP commented 10 months ago

Problem: Downloading (via Likes page) from many projects with multiple art assets will cause those assets to be incorrectly downloaded. For now, there's no consistent number of downloaded assets that will cause such corruption, but past tests put this at at least 70 files.

Temporary address: To somewhat help mitigate this problem, a slider has been introduced in the ASID_CTk app. This slider tells the scraping script to download from only the first n projects; the first n most recently "liked" projects will then be visited. A user can slide the slider down to a low number of projects (<= 25) to visit and scrape in their initial pass, and gradually ramp up the number in consecutive passes. Users should consider if projects in the chosen Likes page have large numbers of art assets.

modestlyOP commented 9 months ago

Possible solution: implement a queue data structure to the download system, wherein newly found art assets are added to the back of the queue while those at the front of the queue are downloaded first. This will bring order to the chaos that is running numerous downloads side-by-side (and probably on the same thread).