hotsapi / Hotsapi.Uploader

Uploads Heroes of the Storm replays to hotsapi.net
MIT License
26 stars 10 forks source link

Parallel processing #17

Open poma opened 7 years ago

poma commented 7 years ago

Uploader performance can be greatly increased by introducing the following optimizations:

poma commented 6 years ago

using bulk fingerprint check increases complexity a lot (at least I don't know how to do it in easy way) so for now I only implemented naive multi threading with a single queue.

poma commented 6 years ago

Looks like multi threaded uploader can hit api throttle limits if there is a long list of replays consisting almost entirely of duplicates (for example when someone lost/deleted their replay cache). Implementing bulk check can fix that so I guess I still need to make it. To keep things simple I can put new replays found on launch in a Dataflow while still processing new ones with a standard loop.

martijnhoekstra commented 4 years ago

heroesprofile mentions they prefer not changing the client at all in https://discordapp.com/channels/650747275886198815/651068646025592832 I'm not sure how to link to a specific message on discord.

Zemill commented 4 years ago

This is where I am coming from.

I would like the replay uploader to upload from Oldest to Newest, and I would like it to do so sequentially.

I am not sure what reason we have to do otherwise other than to speed up the uploads? The standard user uploads after every game, so a non-issue. Even if they have a few games, it really isn't an issue. So we are making an update to resolve an inconvenience for the few that have never uploaded.

I am not opposed to making updates, I am opposed to making updates that makes the data harder to use for the developers.

martijnhoekstra commented 4 years ago

I would like the replay uploader to upload from Oldest to Newest, and I would like it to do so sequentially.

This is going to be best-effort at best. You can't know what other users have uploaded at the time of upload, and there will be uploads that are older than the newest upload. Disallowing that is infeasible.

On the local machine, it's also best effort, depending on where the files are.

Regardless, the meat of the PR is parallelizing the fingerprint checking and doing it in bulk. I'm happy to do uploads fully sequential in order of replay time. That does require first parsing all available replays to even find the replay time though, which would also be nice to do in parallel.