sselph / scraper

A scraper for EmulationStation written in Go using hashing
MIT License
448 stars 88 forks source link

Batch api support #254

Open J-Swift opened 4 years ago

J-Swift commented 4 years ago

(cc #253)

Heres step 1 of batching. Just wanted to get feedback early on the top-level design while I work on the rest of the pipeline. I'm not a fan of the nesting but the alternatives (by introducing varying levels of goroutines) ended up with deadlocks/race-conditions that were harder to work through than just the nested imperative code.

All this does is introduce a buffering to the scraper workers and made a batch GetGames interface which defers to the existing (now unexported) getGame for now. Next step is to pass the batch support all the way down the chain.

sselph commented 4 years ago

Sorry for the delay. LGTM

J-Swift commented 4 years ago

Had to work on some contracting obligations, but finally got back to this.

There were a lot of false-starts on things that ended up getting pretty ugly, but this is an initial full-stack implementation. Still messy, and doesn't retain 100% of the original error/fallback in terms of having multiple files per rom tested with multiple sources.

At the very least it is really fast again. scraper -thumb_only -use_nointro_name=false -workers=8 runs against my full 1G1R NES collection in ~12 seconds, finding 680 of 854 roms.

NOTE: This needs thorough testing as I don't utilize a lot of the options on my system. Nothing seems to have been broken by batching but I made the global batch size 100 just on a whim. I could see some unforseen memory issues arising because of that

J-Swift commented 4 years ago

Hey @sselph just pinging to see if you had a chance to take a look at this.