treee111 / wahooMapsCreator

Create maps for Wahoo device based on latest OSM maps
247 stars 25 forks source link

[FEAT] Add logging of timing for total time, each main section and for each tile generation step #225

Closed alfh closed 10 months ago

alfh commented 10 months ago

This PR…

Add logging of duration of various steps of the process, to be able to see where the focus for improvements should be, and to be able to detect any changes in performance from version to version, or from machine to machine, or from country to country.

Considerations and implementations

There are a couple of commits at the start of the history which are really general small fixups, I could have created a separate PR for them.

Ended up adding a Timings class for handling the actual duration and generating the text for logging, or doing the actual logging, based on PR comments from @treee111 .

I have just added simple logging, and have not looked for any general "metrics" python library.

Only wall time is logged. Originally I also included CPU time in most places, but I think the CPU time it less interesting, since it is mainly other processes that are being spawned. I have not added an CLI option to choose whether you want the logging or not, I assume it is quite interesting for most people.

I have added debug level logging for each tile, to get the duration taken for each tile.

Originally logged the time with 5 decimals, but reduced it to 3, even though that means some steps are shown as taking 0.000 seconds.

I have made one final restructuring of the commits, have kept the main parts of the commit from @treee111 for the PR review, a couple of minor suggestions there I splitted out, and combined with my commits, to have a cleaner commit history, with fewer "correction" commits.

How to test

  1. Run the CLI to generate a tile or a country, and observe the logging of duration. e.g. python -m wahoomc cli -co malta
  2. Run the CLI to generate a tile or a country, and observe the logging of duration. e.g. python -m wahoomc cli -co malta -v
  3. Run the unit tests with "python -m unittest -v" to see the logging.

Pull Request Checklist

alfh commented 10 months ago

I see that pylint now complains about too many local variables in merge_splitted_tiles_with_land_and_sea.

Unsure if I should just remove the logging there for now, or if the method should be splitted somehow, but that would be a general refactoring, and not as part of the adding of timings, I think.

alfh commented 10 months ago

I force pushed some updated, before I noticed that you had commented the PR, sorry about that.

Thanks for the feedback, I'll be looking into these wise comments now.

alfh commented 10 months ago

My considerations on the logging style

INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+  took 1.80000 sec.

vs

INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 1 of 2) Coordinates: 137,100 / took 1.80000

is that I wanted to be able to do a "grep" on the log output, and quickly find all tiles that were slow / fast / something else.

But it is also important to log the tile when starting to process it, in case there is some deadlock or something.

I was now considering a "log_tile_start" method, which does not log with a newline, and then a "log_tile_end" which then logs the "took .." section. But it seems Python has no nice way of achieving that with the log framework.

I also saw that there are decorator pattern, which could possible be used for both the logging of the main tile info, and the duration at the end.

alfh commented 10 months ago

Maybe the tile duration could be logged as debug instead of info log level? So only people really into the details would see it.

Not quite sure who such differences are handled in Python.

treee111 commented 10 months ago

thanks for thinking about outputting the per tile details and giving your thought behind the logging :-)

My considerations on the logging style

INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+  took 1.80000 sec.

vs

INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 1 of 2) Coordinates: 137,100 / took 1.80000

is that I wanted to be able to do a "grep" on the log output, and quickly find all tiles that were slow / fast / something else.

But it is also important to log the tile when starting to process it, in case there is some deadlock or something.

I was now considering a "log_tile_start" method, which does not log with a newline, and then a "log_tile_end" which then logs the "took .." section. But it seems Python has no nice way of achieving that with the log framework.

Yesterday that was also my first thought. Adding "took .." or somehow replacing the whole line with the addition "took .." but that is not possible without hacks as you found out as well.

I also saw that there are decorator pattern, which could possible be used for both the logging of the main tile info, and the duration at the end.

I saw decorators also in another project and thought that they could be used for the timing stuff. Because I have never used them I'm not really sure how that would go and look...

Maybe the tile duration could be logged as debug instead of info log level? So only people really into the details would see it.

Not quite sure who such differences are handled in Python.

Very nice idea - i like! 🥇 That would mean, that all other log.debug messages would be outputted as well, but I think for your grep that would not matter. Outputting as debug is easy and done in the repo as well. output it with log.debug and call wahooMapsCreator with -v.

alfh commented 10 months ago

@treee111 Rebased onto develop, and adjusted based on your comments.

I want to clean up the commit history somewhat, but ran out of time today, but still wanted to possibly get some comments to the way I added the debug logging for the tile timings.

I ended up not using the debug vs info logging functionality in Timings class, so I might as well remove that.

Unsure how you feel about the log_tile_info and log_tile_debug methods in osm_map_functions.

Getting the debug logging to display using the "cli" works with "-v", but when running the unit tests, that does not currently work is seem like.

treee111 commented 10 months ago

@treee111 Rebased onto develop, and adjusted based on your comments.

👍 looks very promising

I want to clean up the commit history somewhat, but ran out of time today, but still wanted to possibly get some comments to the way I added the debug logging for the tile timings.

Due to that we only squash to develop, all commits will get into one anyway so I would be OK as-is now. Having a clean commit history also in PR's is something I also like, so feel free to clean it up if you have some spare time.

I ended up not using the debug vs info logging functionality in Timings class, so I might as well remove that.

If you don't use it remote it, so you can also remove the value from the constructor where you hand it over but do not use it acutally. If one would use the debug vs. info functionality of Timings, I thought of this in the timings class

grafik

then giving the two return values to the log_tile function

grafik

but that does not work as we have a tuple now and not two separate parameters.

grafik

Could also be achieved but wouldn't work for situations where we do not need the True or False because INFO logging is fixed (could also be achieved but would make it even harder to maintain). Furthermore that would not be easy to read and maintain in my opinion.

Unsure how you feel about the log_tile_info and log_tile_debug methods in osm_map_functions.

I'd rather remove them as written and already committed here: 4468c7e. If you don't agree feel free to remove that commit and we talk about that one more time. What I really like is your refactoring of log_tile method - nice!

Getting the debug logging to display using the "cli" works with "-v", but when running the unit tests, that does not currently work is seem like.

If I run unittests I normally have no output and do not need any. Do you want to check and compare the timing using unittests? For which unittest files do you want it to work? For test_generated_files.py you could easy add it to the cli call in my opinion (haven't tried yet): https://github.com/treee111/wahooMapsCreator/blob/199a13ee9cb9d5216f9bca6594d457fda9090b2a/tests/test_generated_files.py#L192-L193

alfh commented 10 months ago

This is how the logging looks like now

INFO:--------------------------------------------------------------------------------
INFO:--------------------------------------------------------------------------------
INFO:# check geofabrik.json file
INFO:# Input country: malta.
INFO:--------------------------------------------------------------------------------
INFO:+ Involved country: malta
INFO:--------------------------------------------------------------------------------
INFO:# check land_polygons.shp file
INFO:--------------------------------------------------------------------------------
INFO:# check countries .osm.pbf files
INFO:+ Checking for old maps and remove them
INFO:+ mapfile for malta: up-to-date.
INFO:--------------------------------------------------------------------------------
INFO:# Filter tags from country osm.pbf files
INFO:+ Filtering unwanted map objects out of map of malta
INFO:+ Filter tags from country osm.pbf files: OK, took 1.072 s
INFO:--------------------------------------------------------------------------------
INFO:# Generate land for each coordinate
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ Generate land for each coordinate: OK, took 0.625 s
INFO:--------------------------------------------------------------------------------
INFO:# Generate sea for each coordinate
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ Generate sea for each coordinate: OK, took 0.000 s
INFO:--------------------------------------------------------------------------------
INFO:# Split filtered country files to tiles
INFO:+ (tile 1 of 2) Coordinates: 137,100 / malta
INFO:+ (tile 2 of 2) Coordinates: 138,100 / malta
INFO:+ Split filtered country files to tiles: OK, took 2.043 s
INFO:--------------------------------------------------------------------------------
INFO:# Merge splitted tiles with land, elevation, and sea
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ Merge splitted tiles with land, elevation, and sea: OK, took 1.872 s
INFO:--------------------------------------------------------------------------------
INFO:# Creating .map files for tiles
INFO:+ (tile 1 of 2) Coordinates: 137,100
INFO:+ (tile 2 of 2) Coordinates: 138,100
INFO:+ Creating .map files for tiles: OK, took 2.846 s
INFO:--------------------------------------------------------------------------------
INFO:# Create: .map.lzma files
INFO:+ Country: malta
INFO:+ Copying .map.lzma tiles to output folders
INFO:+ Create .map.lzma files: OK, took 0.001 s
INFO:--------------------------------------------------------------------------------
INFO:# Create: .map files
INFO:+ Country: malta
INFO:+ Copying .map tiles to output folders
INFO:+ Create .map files: OK, took 0.001 s
INFO:--------------------------------------------------------------------------------
INFO:# Total time 8.476 s

When running with verbose, it looks like this (just an excerpt of the log)

INFO:# Generate sea for each coordinate
DEBUG:+ (tile 1 of 8) Coordinates: 122,69 / took 0.000 s
DEBUG:+ (tile 2 of 8) Coordinates: 122,70 / took 0.000 s
DEBUG:+ (tile 3 of 8) Coordinates: 122,71 / took 0.000 s
DEBUG:+ (tile 4 of 8) Coordinates: 122,72 / took 0.000 s
DEBUG:+ (tile 5 of 8) Coordinates: 123,69 / took 0.000 s
DEBUG:+ (tile 6 of 8) Coordinates: 123,70 / took 0.000 s
DEBUG:+ (tile 7 of 8) Coordinates: 123,71 / took 0.000 s
DEBUG:+ (tile 8 of 8) Coordinates: 123,72 / took 0.000 s
INFO:+ Generate sea for each coordinate: OK, took 0.000 s
INFO:--------------------------------------------------------------------------------
INFO:# Split filtered country files to tiles
INFO:+ (tile 1 of 8) Coordinates: 122,69 / faroe-islands
DEBUG:+ (tile 1 of 8) Coordinates: 122,69 / faroe-islands took 0.082 s
INFO:+ (tile 2 of 8) Coordinates: 122,70 / faroe-islands
DEBUG:+ (tile 2 of 8) Coordinates: 122,70 / faroe-islands took 0.555 s
INFO:+ (tile 3 of 8) Coordinates: 122,71 / faroe-islands
DEBUG:+ (tile 3 of 8) Coordinates: 122,71 / faroe-islands took 0.952 s
INFO:+ (tile 4 of 8) Coordinates: 122,72 / faroe-islands
DEBUG:+ (tile 4 of 8) Coordinates: 122,72 / faroe-islands took 0.097 s
INFO:+ (tile 5 of 8) Coordinates: 123,69 / faroe-islands
DEBUG:+ (tile 5 of 8) Coordinates: 123,69 / faroe-islands took 0.088 s
INFO:+ (tile 6 of 8) Coordinates: 123,70 / faroe-islands
DEBUG:+ (tile 6 of 8) Coordinates: 123,70 / faroe-islands took 0.796 s
INFO:+ (tile 7 of 8) Coordinates: 123,71 / faroe-islands
DEBUG:+ (tile 7 of 8) Coordinates: 123,71 / faroe-islands took 1.480 s
INFO:+ (tile 8 of 8) Coordinates: 123,72 / faroe-islands
DEBUG:+ (tile 8 of 8) Coordinates: 123,72 / faroe-islands took 0.508 s
INFO:+ Split filtered country files to tiles: OK, took 4.559 s
INFO:--------------------------------------------------------------------------------
INFO:# Merge splitted tiles with land, elevation, and sea
INFO:+ (tile 1 of 8) Coordinates: 122,69
DEBUG:--------------------------------------------------------------------------------
DEBUG:# Sorting land* osm files
DEBUG:+ Sorting land* osm files: OK
DEBUG:+ (tile 1 of 8) Coordinates: 122,69 / took 0.375 s
INFO:+ (tile 2 of 8) Coordinates: 122,70
DEBUG:--------------------------------------------------------------------------------
DEBUG:# Sorting land* osm files
DEBUG:+ Sorting land* osm files: OK
DEBUG:+ (tile 2 of 8) Coordinates: 122,70 / took 0.815 s
INFO:+ (tile 3 of 8) Coordinates: 122,71

Note that there is a bit excessive logging about the sorting, where three lines appear for each tile about sorting. That is "as-is", I will probably raise a PR to get rid of that, and only keep one line per tile about sorting

alfh commented 10 months ago

@treee111 I consider the PR ready for final review now, I currently do not know of anything outstanding.

treee111 commented 10 months ago

Unsure how you feel about the log_tile_info and log_tile_debug methods in osm_map_functions.

I'd rather remove them as written and already committed here: 4468c7e. If you don't agree feel free to remove that commit and we talk about that one more time. What I really like is your refactoring of log_tile method - nice!

@alfh what do you think of my described change? You lost my commit in your current branch, restored it from my local repo here: https://github.com/treee111/wahooMapsCreator/commit/4468c7e1b0d89f74dd400b87cbcd7489691e92b1

other than that I am with you and will approve this PR

alfh commented 10 months ago

Yes, sorry I missed that commit, noticed it afterwards.

I want to have it quite visible when reading code if logging goes to info or debug.

So I prefer log_tile_info and log_tile_debug, and do not mind the "overhead" of two extra method. It also means pylint will not complain about too many parameters.

But it is probably mainly a code style taste. I am OK with both approaches.

Alf

⁣Få BlueMail for Android ​

  1. nov. 2023, 17:21 kl. 17:21 skrev "Benjamin K." @.***>:

    Unsure how you feel about the log_tile_info and log_tile_debug methods in osm_map_functions.

    I'd rather remove them as written and already committed here: 4468c7e. If you don't agree feel free to remove that commit and we talk about that one more time. What I really like is your refactoring of log_tile method - nice!

    @alfh what do you think of my described change? You lost my commit in your current branch, restored it from my local repo here: https://github.com/treee111/wahooMapsCreator/commit/4468c7e1b0d89f74dd400b87cbcd7489691e92b1

    other than that I am with you and will approve this PR

    -- Reply to this email directly or view it on GitHub: https://github.com/treee111/wahooMapsCreator/pull/225#issuecomment-1804143384 You are receiving this because you were mentioned.

    Message ID: @.***>