Parsing may be slow for some use cases

deavid commented 8 months ago

I noticed that in maps that are 128x128 tiles with several tile layer (around 7) and several tilesets with 50+ sprites, a map load can take 100ms in a fast computer. I suspect that a 256x256 map can easily take around 500ms but I have not tested.

Consider the use case where one would like to pre-load all maps in a folder to validate them and retrieve some basic info such as map properties, in order to build a database in memory for a game to show what maps are available. If the game has community map-packs, it easily can surpass 100 maps. In this case the game could take several seconds to build this initial state and data.

Of course, the solution is to load the data in another way or to cache it somewhere. But it is an inconvenience.

From the looks of what a TMX file has, I don't think that the XML parser itself is the problem (talking about xml-rs and https://github.com/mapeditor/rs-tiled/issues/137 ), but the work that this library does while parsing, creating all the references for the proper sprites for each tileset, etc.

Profiling the library and benchmarking different parts of the code could give you an idea on where the time is lost. But most likely is the lookup of sprites and associating each sprite to each tile.

Consider if some of this work could be deferred to access time. Or for example, initially build a more crude representation of the file in memory that we could use as-is, and another step that creates the final one more database-like that is more intensive to create but faster to use.

At least in my case, I tend to load what I need into my own structures and discard the parser straight away. The problem might be that here we have 2 things in one, both a loader, and a in-memory map database; and we cannot avoid the costs of the latter.

aleokdev commented 8 months ago

Hmm. It is true that there's not a quick way to access surface-level aspects of a map right now. I completely understand your use case. However, creating a progressively-loaded version of the library would probably require some interface rewriting.

I'd really like to have benchmarks at some point, although my intuition tells me that most time is spent in filesystem operations and XML parsing. If this is true, one quick and dirty solution that can be done on user-side for now could be to implement a reader that ignores all files save for the ones you are interested in, but of course that approach only affects entire files.

deavid commented 8 months ago

I really doubt that FS operations and XML parsing is the problem. Unless you open and parse the same file several times, which then that would be the problem.

To prove my point, I wrote a very simple Python parse to showcase parsing speed:

https://github.com/deavid/unhaunter/blob/next/assets/maps/parse_test.py

Running that Python program from that folder is in, outputs the following:

Parsed file: character1.tsx in 0.000075 seconds
Parsed file: unhaunter_custom_tileset.tsx in 0.000044 seconds
Parsed file: unhaunter_spritesheet2.tsx in 0.000036 seconds
Parsed file: unhaunter_spritesheetA_6x6x10.tsx in 0.000437 seconds
Parsed file: unhaunter_spritesheetA_3x3x3.tsx in 0.000533 seconds
Parsed file: map_house1.tmx in 0.000155 seconds
Parsed file: map_school1.tmx in 0.001008 seconds
Parsed file: map_house2.tmx in 0.000884 seconds
['character1.tsx', 'image', 'unhaunter_custom_tileset.tsx', 'tileoffset', 'grid', 'tile', 'unhaunter_spritesheet2.tsx', 'tileoffset', 'grid', 'properties', 'image', 'tile', 'unhaunter_spritesheetA_6x6x10.tsx', 'tileoffset', 'grid', 'properties', 'image', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'unhaunter_spritesheetA_3x3x3.tsx', 'tileoffset', 'grid', 'transformations', 'properties', 'image', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'tile', 'wangsets', 'map_house1.tmx', 'properties', 'tileset', 'tileset', 'tileset', 'group', 'map_school1.tmx', 'properties', 'tileset', 'tileset', 'tileset', 'group', 'map_house2.tmx', 'properties', 'tileset', 'tileset', 'tileset', 'group']

Total time spent parsing files: 0.003172 seconds

Total: 3 milliseconds. Tiled takes around 225ms to complete reading the *.tmx files in there in the same machine.

Please check your assumptions. XML parsing is slow, but it's not that slow.

Here is my use case: https://github.com/deavid/unhaunter/blob/49832097a541311be5c946ff4ac71908fbe9b6c4/src/root.rs#L377

And I had to implement a vary naive loader to make this usable: https://github.com/deavid/unhaunter/blob/49832097a541311be5c946ff4ac71908fbe9b6c4/src/tiledmap.rs#L403

All Unhaunter code, maps and assets are under Apache 2 license (except fonts), so feel free to use them for benchmarks if you want.

But you don't need even benchmarks to see the problem. I bet that if we store a 1024x1024 tile map with a 1024 sprite tileset it will take very long to parse, and just doing a profiling session it would show the problem. (or just adding manually some timing information manually by printing to the console)

Deukhoofd commented 8 months ago

I did a profile with the given test files, most of the overhead consisted of small read syscalls in the xml parser. I created a PR to use a BufReader as underlying file reader, which should remove most of these syscalls. This should greatly improve performance.

deavid commented 8 months ago

I'll take a look to your PR and check performance. If it reduces the timer required by 5 fold, I would expect to see a timing of 44ms; and while the improvement is huge (thanks a lot!), it is still a far cry from "just XML parsing in Python".

I'd suggest that after the PR is reviewed and merged we still leave this issue open to have a better look on what is happening.

Deukhoofd commented 8 months ago

It's Easter, and I was slightly bored, so here's a proof of concept you could use that changes the underlying XML parser, and does some other performance optimizations: https://github.com/Deukhoofd/rs-tiled/tree/xml_performance

Using your test files, I got down to 2564488 ns/iter (+/- 53017) (2.5 ms) when building the crate in release mode (loading the map_house2.tmx map). Most of the overhead is now allocations, storing data, etc.

I probably won't create a PR for it unless asked for, as it fundamentally changes how the library works, and touches almost every file. I wouldn't want to get PRs like that as a maintainer myself ;).

Edit: for comparison, without the changes the benchmarks are 13814870 ns/iter (+/- 634266) (~14ms)

Another Edit: I was curious what the actual proper benchmarks were before I made the earlier BufReader changes, as I guesstimated the 400% performance gain from my debug run times. Looks like that change has a far greater effect when built in release mode. Before the BufReader changes, it benched at 279556497 ns/iter (+/- 9662169) (280 ms). As it benches at 14 ms after those changes, that one was a 1900% performance gain, not a 400% gain :).

deavid commented 8 months ago

That sounds more like it. You got a 6x speedup on top of the 4-5x speedup from the previous one. So roughly 25x performance I guess.

For testing, I would recommend the map_school1.tmx since it's the biggest and it's where the problems can be seen the most.

I see now your edit that we measure now from 280ms to 14ms, then from 14ms to 2.5ms; so ~20x first, and ~100x in total. I would be more than happy with those results.

@aleokdev I would really appreciate if you could consider such a contribution. The first PR that has been sent seems like a no-brainer to me.

aleokdev commented 8 months ago

I was suspecting XML loading was the issue, xml-rs isn't precisely fast. I'm really grateful for your contributions, @Deukhoofd, but one of the things I've wanted to do for a while now is to separate all the XML loading stuff into their own modules, finalizing the work in the current open PR. This'll clean up the codebase a lot and separate concerns. Maybe now's the best time to do it. Either way I can accept the changes without that if you'd like, create the PR and I'll review it when possible. Again, thanks a lot :)

mapeditor / rs-tiled

Parsing may be slow for some use cases #284