smogon / pokemon-showdown

Pokémon battle simulator.
https://pokemonshowdown.com
MIT License
4.77k stars 2.79k forks source link

Ideas on decreasing the learnset file size #8069

Closed lighthouse64 closed 1 year ago

lighthouse64 commented 3 years ago

I noticed that the names of moves, as well as certain requirement strings are repeated a lot in the file. Moves like frustration and hidden power appear almost 900 times. On the move requirement side, strings like "3M" and "4M" appear thousands of times. Thus, I was thinking that rather than repeating these things so many times, the strings could all be put into a temporary array once and then keyed in the BattleLearnsets by their position in the array.

For instance, I tried the following on 3 moves: Frustration, Double Team, and Hidden Power.

Initialize some array like thislet a=["frustration","doubleteam","hiddenpower"] Then replace all of the instances of frustration with [a[0]], doubleteam with [a[1]], and hiddenpower with [a[2]]. Something like frustration:["7M","7V","6M","5M","4M","3M"] would become [a[0]]:["7M","7V","6M","5M","4M","3M"].

Alternatively, if there was some extra code to find out a move's name based on its index # in the array, then one could just refer to it by a number, and frustration:["7M","7V","6M","5M","4M","3M"] could become 0:["7M","7V","6M","5M","4M","3M"].

Through the first method (which would be easier to do), I was able to reduce the file size by 0.02 mb by just replacing those three moves. However, this method wouldn't be very useful for reducing the character count on the requirement strings, but the 2nd method could definitely have use if the requirement strings were split up into multiple arrays (ex. one array for levelup moves and egg moves).

Are there any major problems that would prevent this from working?

Zarel commented 3 years ago

While it's true that would work, there's kind of no point to it.

Do you just want a smaller file? Basically all compression algorithms like Zip will do exactly that, but with much better compression ratios (something like a 90% decrease rather than the 1% you've eked out here).

The reason the file here isn't compressed is to make the file easy for humans to read and edit. It's sent to users compressed when you open PS in a browser.

lighthouse64 commented 3 years ago

Sorry for not being clearer. When I was referencing modifications, I meant modifications for this file: http://play.pokemonshowdown.com/data/learnsets.js, which is output by the builder. I believe this is what you are referring to as the file that PS sends to its users. Also, the 1% decrease was only from 3 moves, so it would make a much larger difference if it was applied at scale.

About compression, I understand what you mean. Because clients are sent gzipped files, the file size does indeed go down from about 2.32 mb to 307 kb. In light of your response, I gzipped both the modified version I created and the default one from pokemon showdown, and the modified version is still smaller, except only by a few hundred bytes, but again, if this applied to many more moves, it could probably drop the file size by a few kilobytes.

As for implementation, I'm perfectly fine with coding this myself. I understand it is kind of petty, since it won't save more than a few kilobytes, but I figured it could be helpful because the file is served many times, so it would save more significant amounts of bandwidth over time.

Thanks for taking the time to read this. I just wanted to make an issue discussing this first before attempting to making a PR so that I wouldn't waste time if it caused issues with PS's structure.

Zarel commented 3 years ago

Oh, I understand now.

I think there are gains you could get from using a more optimized storage system for learnsets. I haven't looked into them because I don't think you can squeeze a particularly useful amount of optimality from this.

I think the specific approach you've come to is a bit too ad-hoc. If you wanted to optimize for density, you would probably want to encode all the data in optimized binary and convert it to base64. Maybe BSON, and then replace every species and move ID with an index number?

monsanto commented 1 year ago

Seems reasonable, but no follow-up.