shizmob / arcade-docs

Open arcade documentation repository
99 stars 28 forks source link

Archive links for auction links #16

Open Poliwrath opened 2 years ago

Poliwrath commented 2 years ago

YAJ listing pages don't last very long (unsure how long eBay ones last), and aucfan/the billion other proxy sites that save YAJ pages aren't guaranteed to keep them. This could be extended for everything, but most sources (i.e iMp95's site) have been on the internet for ages, so the likelihood of them suddenly disappearing is probably low... Some japanese twitter users also purge old tweets.

Also closedsearch is an invaluable resource for finding <6 month old YAJ pages.

shizmob commented 2 years ago

Yep, agree we need some work here. I'm also a fan of using https://aucview.aucfan.com/yahoo/<auction ID> to be able to check deleted auctions/removed images, but even that has its limitations.

Poliwrath commented 2 years ago

Some YAJ sellers go the extra step to delete auction images after the listing is over (before the 6 month auto auction delete or whatever) fwiw. I don't exactly have a known list of sellers that do that on hand but I do know cyberdaioo does it sometimes.

voidderef commented 2 years ago

We already talked about a even more general solution of automatically crawling links regularly and persisting the files, e.g. using a github action cronjob that runs once a day.

One option to get this idea started might be to focus on the yahoo auction links first, and explore it with the limited scope. There might be a bunch of useful learnings from that before this can/should be scaled further.

biggestsonicfan commented 2 years ago

Any updates to this? I am already finding dead links in the Sega boards section.

shizmob commented 2 years ago

Yeah! I've been working on a small Python tool called aucscrape in the meantime. It supports finding Yahoo, eBay and Mercari auction links and retrieving and saving their metadata and media using either the original site or a number of mirror sites. Right now its scraping support is limited to Yahoo auctions, but I'm planning to add eBay and Mercari scraping when I can.

I've already ran it over the repository and stored all Yahoo auctions locally, so rest assured those are safe right now. More soon!