triplea-game / triplea

TripleA is a turn based strategy game and board game engine, similar to Axis & Allies or Risk.
https://triplea-game.org/
GNU General Public License v3.0
1.29k stars 382 forks source link

Game file sizes are too large #37

Closed djensen47 closed 8 years ago

djensen47 commented 8 years ago

On one individual's computer, a large file is not an issue but when dealing with these file en masse it it burdensome.

Benefits of a smaller file format:

Options:

(oops, didn't mean to close and re-open)

veqryn commented 8 years ago

You are really only concerned about savegame size, correct? Savegames aren't stored as xml, so EXI does not apply. Right now, savegames are just Gzipped java serialized objects. With the exception of a couple games, they are mostly quite small. Games that have big savegames are games that: Have a large or complex game data object (ie: a large or complicated map, and/or rules) Games that go on a long time (because of history, and undoing moves, everything is stored) Large numbers of units, territories, etc.

djensen47 commented 8 years ago

Ah, for some reason I thought it was XML. My apologies. I still wonder if one of these "standard" formats would help?

veqryn commented 8 years ago

They wouldn't affect the savegame, or its size. The game xml is really only used in the map zip, and when it is ready it is immediately turned into a serializable game data object. The xml is never sent over the network. The only change switching to something like MessagePack would do, is making the game/map's xml impossible to ready by a human.

djensen47 commented 8 years ago

Out of curiosity, why did you close this issue? Maybe I should have been more explicit about what is too large, the save game files.

I think there is lot's of room for improvement:

                        <b>World War II v5 1942 Second Edition</b>
                    <br>
                    <br>An update of the popular "Spring 1942" game, which itself was an update of "Revised".
                    <br>Changes:
                    <br>1. Several territories/SZs added or modified.
                    <br>2. More initial starting units (including new Factories on Karelia and India).
                    <br>3. Armor cost increased to 6 PUs.
                    <br>4. AA Guns now cost 5 PUs and each can fire at a maximum of 3 planes, each plane being only fired upon once, and they can be combat casualties.
                    <br>5. Factories now have their own AA to defend during strategic bombing.
                    <br>6. Honolulu is now a Victory City.
                    <br>
                    <br>1st Optional Rule: SZ16 (Black Sea) can't be accessed by ships.
                    <br>2nd Optional Rule: Select the game option "Raids May Be Preceeded By Air Battles" to turn on Escorts and Interceptors for Strategic Bombing Raids. 
                        All Escorts, Interceptors, and Bombers all @ 1 for a single round before the bombing raid (same rules as Global 1940 Second Edition).
                    <br>
                    <br>Also, Technology is turned off by default (if turned on, will use ww2v3 (AA 50th Anniversary Edition) technology).
                    <br>
                    <br>
                    <br><b>Victory Condition:</b>
                    <br>Quick Victory: Axis must control 8 Victory Cities.  Allies must control 9 Victory Cities.
                    <br>Normal Victory: Axis must control 9 Victory Cities.  Allies must control 10 Victory Cities.
                    <br>Full Victory: Axis must control 13 Victory Cities.  Allies must control 13 Victory Cities.
                    <br>
                    <br>
                    <br><b>Generic How-To-Play:</b>
                    <br>The game is made up of rounds, during a round each player gets to do a number of steps/phases.
                    <br>The phases are, in order: Research Technology, Repair Factories and Purchase Units, Combat Movement, Resolve Battles, Non-Combat Movement, Place Units.
                    <br>At the beginning of your turn, you purchase units. At the end of the turn, you get to place those units in territories you own that have a factory.
                    <br>During Combat Movment, you move any units to attack enemy units and territories. During Non-Combat Movement, you may move any units that have movement left (attacking an enemy remove any movement of land and sea units, but not air units).
                    <br>Battles happen by use of dice. A unit has a certain attack power, and you roll a dice for each unit. If your dice is equal or less than the attack power of the unit, you have scored a hit.
                        So for example, a tank attacks on 3. For it, you will roll a single die, and if you score 1-3 on the die you have hit the enemy, while if you score 4-6 you have missed the enemy. An infantry defends on 2, so a roll of 1-2 is a hit, while 3-6 are misses.
                    <br>After the attacker has rolled dice for each of his units, the defender tallies the total number of 'hits' and then selects which of his defending units will die later. After this, the defender rolls for his units and the attacker selects which of his units will die.
                        When both have finished rolling, the units selected to die are removed from the game. If there are no more attackers left, then the defender has won, and if there are no more defenders left, then the attacker has won and he moves his remaining attacking units into that territory.
                        If both players have units left still, the attack may choose to play another round of battle, or retreat all his remaining forces to a territory where at least one of his forces came from.
                    <br>Players must work with their allies to destroy the enemies, with the game ending when one side surrenders or certain conditions are met (like having captured a clear majority of the major cities).
                    <br>
                    <br>
                    <br><b>Optional Technologies:</b>
                    <br>*** Air/Naval Tech ***
                    <br>SUPER SUBS- submarine units get +1 attack
                    <br>JET POWER- fighters get +1 attack
                    <br>IMPROVED SHIPYARDS- naval units are cheaper
                    <br>AA RADAR- AA hit on 2 or less
                    <br>LONG RANGE AIRCRAFT- aircraft range increased by 1
                    <br>HEAVY BOMBER- roll 2 dice for each bomber, and selects the best one
                    <br>
                    <br>*** Land/Production Tech ***
                    <br>IMPROVED ARTILLERY SUPPORT- artillery support 2 infantry
                    <br>ROCKETS- AA conduct rocket attacks for 1d6 damage to production (each factory may only be targetted once per turn by one rocket, and only 1 rocket in each territory may fire)
                    <br>PARATROOPERS- each bomber may carry 1 infantry into combat (must stop in first enemy territory it reaches)
                    <br>INCREASED FACTORY PRODUCTION- factories produce 2 additional units (if territory value is 3 or greater), repairs 1/2 price
                    <br>WAR BONDS- collect an 1d6 extra PUs each turn
                    <br>MECHANIZED INFANTRY- tanks may carry 1 infantry each for 2 spaces
                    <br>
                    <br>
                    <br>
                    <br>Credits:
                    <br>Hobbes for basetiles and initial xml.  Veqryn for relief tiles, decorations, and corrections.
                    <br>Also thanks to Jason/TripleElk for creating ww2v2 Revised and ww2v3 50th Anniversary artwork and tiles, which were also ported over to this version.
                    <br>
djensen47 commented 8 years ago

So it looks like TripleA is using plain ol' Java object serialization. This can be improved. If the entry point to serialization is an interface, it might not be too difficult to test this out.

I would like to see this issue re-opened, please.

DanVanAtta commented 8 years ago

This is getting similar to the reflection and serialization discussion we had #20. In that discussion it was agreed getting rid of serialization (and RMI) would be good things, and it would be good to start with save games. It was recognized that would also be a pretty ambitious project. I could be convinced to create a new place holder issue for it, but I think at this point we agree that we should create an API to save games. We could then compress it further in addition to other benefits (engine upgrades, robustness/protected from bugs - ie: with serialization if objects get into a doomed state, you'll save the doomed state and never get out of the loop. Saving from API should give a clean load)

DanVanAtta commented 8 years ago

@djensen47 , re-open? this appears still closed

djensen47 commented 8 years ago

I don't have permission to re-open. Should I re-open or open a more specific issue instead?

veqryn commented 8 years ago

The game files, meaning the installed files? In that case, 99% of the bulk is from images and sounds. Almost all xml and text is zipped or compressed in some form, and I prefer compressed readable text to uncompressed undecipherable text.

If we want to attack size and get the most bang for our buck, then we should look into using mp3 instead of wav files.
After that, I wouldn't bother compressing anything any more, because it simply isn't worth it. Instead, I'd look into better ways of updating/upgrading, so that the entire package doesn't need to be downloaded each time.

djensen47 commented 8 years ago

Sorry, the save game files are "too big" (in aggregate). The size of TripleA itself, isn't that bad in the grand scheme of things. It's the files that get transmitted frequently, e.g. save games.

Again, on one person's computer it's fine but on a server, like the forums, it becomes a problem.

veqryn commented 8 years ago

Ok, so like I said, the save game files are nothing to do with xml. They are compressed serialized java objects. Yes, there is some string text, including a limited amount of html, inside the game data object. But it is compressed twice (first as a serialized java object, where there is never more than one instance of a string, and secondly gzipped), and in any case is quite limited. The only html in there is the game notes and sometimes notes on units. If you were to load a large savegame as a game data object, with nothing else going on, and did a java dump and then examined what was on the heap, the xml/html text represents less than a fraction of 1% of the total size.

I'm going to close this now.

djensen47 commented 8 years ago

I wouldn't bother compressing anything any more, because it simply isn't worth it

I still strongly disagree on this matter. You're not managing and paying for servers where users are trying to upload dozens of these very large files at the same time. At the very least research should be done on this matter. Java binary plus Gzip is not the end all be all solution to compression. In fact this is exactly where there are other, better binary formats out there.

djensen47 commented 8 years ago

I'm super close to shutting off TripleA uploads to axisandallies.org. I don't have time to babysit nor debug a server that is bogged down by numerous large uploads.

DanVanAtta commented 8 years ago

I would recommend creating a new ticket so we can stay focused on what the problem is. Updating how saves are done is a pretty deep fix at the moment, there is a lot to do in the meantime as well to make matters worse.

In general I'd like save games to not be done with serialization, which couples object implementation details with your save game, not a good thing, and forces us to carry around the old game engine jars. So there is a lot of benefit for certain in updating how the save games are done.

djensen47 commented 8 years ago

@DanVanAtta Since you seem to know what it takes to do this and other reasons why it is needed (besides compression), would you mind opening the issue?

veqryn commented 8 years ago

So, there are really a couple different issues here, even if they are all related or interconnected.

  1. What format do we want to persist data in? This is separate from how do we want to compress it, though it will affect compression and size. The current format is to use a serialized object. We currently use Java's default serialization for most things, and implement Externalizable for a few classes. But this could be any form of serialization, whether we went with Java's, Kyro's, google's protocol buffers, thrift, etc. The choices come down to something like: a. Some kind of serialization b. Some kind of text or byte based format (JSON, XML, etc) For example, we may choose to go with JSON, which might actually be larger in size than a serialized object, due to choices around having a stable format unaffected by changes to the source code.
  2. How do we want to persist the data? GSON vs Jackson? JAXB vs XOM? Externalizable vs Kyro vs protocol buffers vs thrift? Which formats take advantage of pointers? We have MANY instances of the same object in our saves, so any format that doesn't include a way to point to an already existing string instance, or unit or territory instance, will not be efficient.
  3. How will this affect sending data over the network?
    Do we change that to match, or keep saving the game to disk separate from the network traffic (serialization again) for now? Do we also change from RPC to something else, and if so what? How to break these parts down into smaller tasks?
  4. How do we want to compress the data? GZIP? Or maybe something that achieves a much higher ratio of compression, possibly using recursive compression?
  5. How do we reduce the load on AxisAndAllies.org's website from uploaded savegames?

Do we do this by reducing the size of the savegame? If so, it is not necessary that we do 1-4, when there are other options, such as not including so much information in persisted data. We could for example: a. Eliminate any data redundancies, where we could pull the data from the map files or xml again (unit notes, game notes, history of attachments, etc). b. A way to delete game history c. Eliminate any persisted data that isn't used by that particular rule set (example: no need to keep a record of battle outcomes, if the map doesn't use triggers based on battle outcome data).

If we do decide that it is necessary to change our persistence format to something else, I would want us to do some trials with long running gamesaves (ex: round 10, ww2 1940) to see just much savings there were, and whether it was worth the effort.

Or do we attack the problem even more fundamentally: how and what A&A.org saves. A&A.org is keeping a record of every single turn someone plays, from start to end. Perhaps a much better system would be to only store the last 2-5 saves in any Thread. A game like 1940 might be saved to the forum 10 or more times per round (once per player turn, plus once for every battle casualty selection or scrambling choice, etc). A game that takes 20 rounds to finish, would then have 200 uploaded saves. If each was 500KB, we are talking 100MB just for that game thread. Reduce it to the last 4 saves, and you have gone from 100MB to 2MB. That is better compression than any tinkering with formats will get us.

On top of this, I am willing to bet a lot that only certain games are causing issues for you in terms of space. A game like Revised, or v3, is going to have an incredibly small savegame, even after many rounds. But games like 1940 will be much larger. Perhaps we need some data or simple charts on what is actually causing the problem. A mapping of % of spaced used compared to what threads the data is in (a wild card match against "1940" vs "revised" vs other, etc). This would help us focus on what in particular is causing problems, if it turns out is only one bloated object in the game data that we could fix by making externalizable, for example.

djensen47 commented 8 years ago

I might be able to extract that data.

DanVanAtta commented 8 years ago

Yes, we need to work on the save game aspect.

First, a DAO type of architectural layer put in place so the exact persistence technology can be replaced without touching all the codes.

Second, it'll be great to get away from the super-coupling caused by Java serialization. For example, I was just playing a game, we ran into a fatal error, saved and reloaded. The fatal error is saved with the game, there is no way to load. With 3 hours of game play lost, dice rolled, strategies revealed, what was a 12 hour running game is likely ruined. This is a pretty devastating experience for a tripleA player. So concretely, any save feature we have, if the save is successfully done, then we should always be able to load that file again and not save errors with it. The should also be robust enough to always be able to save the current game state, and not choke because of some movement error, or an error transferring data to a network player.

DanVanAtta commented 8 years ago

@djensen47 , any updates on your side?

DanVanAtta commented 8 years ago

I cut a ticket summarizing this problem: https://github.com/triplea-game/triplea/issues/250 "axisandallies.org has trouble supporting PBF"

Closing this ticket in favor of the above.