vsariola / pakettic

TIC-80 cartridge packer
MIT License
19 stars 4 forks source link

feat: convert numbers between hex and decimal #11

Closed dancek closed 1 year ago

dancek commented 1 year ago

I noticed a 0x3fc0 didn't get converted to the shorter 16320. Here's a unit test and implementation for converting in both directions (though dec->hex is rarely useful).

dancek commented 1 year ago

Oh btw I'm called hannu in the sizecoding discord. In case you have comments.

vsariola commented 1 year ago

The hex conversions are already done by the global "hints" that tell whether all hexadecimals are printed as hexadecimals, or decimals. The reason was that there is potentially a lot of hexadecimal constants, and it usually packs best either to convert all of them to decimals, or keep all of them as hexadecimals, not mutate one by one. We want to keep the number of mutations as small as possible, because it increases the search space.

Now, why exactly the hexadecimals did not get converted I don't know :) The existing hexadecimal conversion mechanism might be broken... I'll first investigate the existing mechanism, and then consider adding mutating individual constants.

vsariola commented 1 year ago

Ok, so I investigated this. Even without this pull request, the first test works, if you add ast.Hint around the root node:

    def test_minify(self):
        cases = {
            'poke(0x3fc0,0)': 'poke(16320,0)',
            'print(1000000000001)': 'print(0xe8d4a51001)',
        }
        for a, expected in cases.items():
            with self.subTest(parsed=a):
                root = ast.Hint(parser.parse_string(a))
                opt_root = optimize.dlas(root, steps=100, list_length=5, init_margin=0, seed=0, cost_func=lambda root, _: len(printer.format(root)))
                printed = printer.format(opt_root)
                self.assertEqual(printed, expected)

Now, the second test fails, because I have only one hint "try printing all (integer) hexadecimals as decimals", but I don't have the opposite hint "try printing all decimals as hexadecimals", as it would in practice be always a very bad idea (imagine every 1 becoming 0x1!). Also, after compression, I have found that even when the decimals are about the same length or one char longer than the hexadecimals, the decimals practically always win after compression, because they have less entropy: (integer) hexadecimals use 17 different characters [x0-9a-f] while the decimals only use [0-9].

The second test case is a bit artificial: most big constants are usually written with scientific notation (1e10). For hexadecimal constants, mostly values within 64k or so are needed, as they are usually memory addresses, and the TIC-80 memory map is no that well aligned that there would be that often a lot of zeros in the hex constants. And there even the uncompressed length is always shorter in decimals than in hexadecimals: 65535 is five characters, while 0xFFFF is six.

As to why exactly a hex constant in your original cart did not get replaced (it should), can you send me the offending cart? It might be that a) pakettic just got unlucky in its randomish search and different seed/longer optimization time would've done the trick; b) it really packs better, for whatever reason.

dancek commented 1 year ago

Closing as obsolete.