Improved caching, added event whitelist filters, and bugfix

WhatIsACore commented 7 years ago

Added ability to whitelist game, message, and tracker events based on key-value matches
Improved archive caching and added a boolean argument to .open that controls whether a cached version should be loaded (if possible).
Fixed issue with header always being archived under protocol29406 instead of the proper build.

jnovack commented 7 years ago

1) Forgive me, what's the purpose of the whitelisting for game, message and tracker events?

2) Nice feature.

3) I think the reason the header was always protocol29406, was for minimum compatibility from Blizzard. If you read the file with protocol29406, you could ALWAYS get information you needed from the header, including the ability to load a later protocol later if you so needed to.

Does anything break older versions? I'm not in a position to test at the moment...

WhatIsACore commented 7 years ago

My personal use case was that I wanted to read the SScoreResult tracker event (which holds end-of-game stats) without outputting the other thousands of tracker events. The theory was that if only the needed events were pushed to the output object, it might run faster (since reading tracker events is extremely slow). After a few tests though, I don't think it actually makes a difference, so functionally the feature is equivalent to simply duplicating the relevant parts of the object afterward. I'm not sure if there is a better way to pull out a single event from all the other trackers that could improve the speed.
Currently protocol29406 is used to read the header and obtain the build for the replay, which is then saved within the archive object along with its relevant protocol. However, because of this initial read, the results of protocol29406 are cached within the archive and when requesting headers for the replay later it checks the archive, notices that header data has been read before, and then simply serves back the cached version. The result is that it will always return protocol29406's header.

WhatIsACore commented 7 years ago

As far as I can tell, it was able to read over 100 replays without issues. However, I don't have any replay data more than a year old.

jnovack commented 7 years ago

Unfortunately, I don't think there is a seek function, even then, you are searching around 30MB of text data when it's uncompressed from binary. No matter what you pull, you HAVE to read every byte, transform it to text, then you can filter it. I'm not against it. Sounds good.
At this point, I'm ok with that. If you are parsing data over a year old, go back to version 0.3.3.

I used to have a repository with initial replays from 29406 to about 40000s, but then I said, "who the fuck is going to use it"? Boy am I kicking myself now.

Doesn't matter.

I have a backup from 2016-09 which I'm looking through to see if I can't find my entire HoTS history, which should include a replay from (nearly) every version.

jnovack commented 7 years ago

As a sidenote, https://github.com/nydus/storm-replay is a C++ binding library for NodeJS which extracts at about a 60% faster rate than Javascript if you are interested in speed.

WhatIsACore commented 7 years ago

Changes implemented.

The oldest replay I tested this on was logged 7/7/2016 and build 44256.

nydus / heroprotocol

Improved caching, added event whitelist filters, and bugfix #5