mie-lab / trackintel

trackintel is a framework for spatio-temporal analysis of movement trajectory and mobility data.
MIT License
199 stars 50 forks source link

Profiling of trackintel - continuous integration #183

Closed abcnishant007 closed 2 years ago

abcnishant007 commented 3 years ago

Based on our discussion, I think it is better to have a memory profiler for trackintel. It will help us ensure that new pull requests do not significantly worsen the memory requirements. Such a profiler will become more relevant as we introduce multithreaded options for trackintel in general.

abcnishant007 commented 3 years ago

A direct application of mprof on an end-to-end demo script for trackintel gives the plot as shown below:

Figure_1

Now, how do we incorporate this as a feature in trackintel? @hong2223 Should we have this option in one of the functions such as "log_RAM_usage=True/False"?

henrymartin1 commented 3 years ago

Some inspiration: https://pandas.pydata.org/docs/development/contributing.html#running-the-performance-test-suite

abcnishant007 commented 3 years ago

@hong2223 may I request for this to be opened?

hongyeehh commented 3 years ago

Sure! We closed this and added to the roadmap as we thought we will not work on this in the near future. Would you start working on this?

abcnishant007 commented 3 years ago

Yes, I was exploring the asv based continuous integration last weekend and this seemed doable, hence the previous comment. You can open now or we can wait until I have pushed more commits to ensure that it works, then we can open the issue as well as the pull request :) Whatever you suggest.

abcnishant007 commented 2 years ago

This is what I have so far. It appears to be working fine apart from some issues which I will summarise as bullet points in the next comment. Attached are some screenshots for reference. We can discuss them in one of our meetings.

=================================== Speed Benchmark List function wise:

Screenshot 2021-10-01 at 17 25 43

==================================== Speed Benchmark with plots function wise: X axis: Commits Y axis: Performance

Screenshot 2021-10-01 at 17 25 33

==================================== Zoomed image of the above, with a pull request is selected showing significant reduction in speed. This is where we changed the for loop to groupby. (da94120)

Screenshot 2021-10-01 at 17 15 17
abcnishant007 commented 2 years ago
abcnishant007 commented 2 years ago
henrymartin1 commented 2 years ago

@abcnishant007 What was the status of this issue? Reading through it it looks like it was almost finished. @bifbof is working on some nice performance increases, it would be great if we could track them.

abcnishant007 commented 2 years ago

This commit contains some snippets that came handy in making this work. For the status of this issue, please refer to this comment below instead.

  1. git log | grep -B 1 "Merge" | grep "commit" | sed 's/commit //g' | cut -c1-7 | head -n 10 > commits.txt; Better to run this on the master, so that we don't have some unncessary merges filtered
  2. asv run HASHFILE:commits.txt
  3. asv publish
  4. Using this gist, we need to force github pages to see this directory.
  5. Quoting the asv rtd, "This will put a tree of files in the html directory. This website can not be viewed directly from the local filesystem, since web browsers do not support AJAX requests to the local filesystem. To share the website on the open internet, simply put these files on any webserver that can serve static content. Github Pages works quite well, for example. If using Github Pages, asv includes the convenience command asv gh-pages to automatically publish the results to the gh-pages branch."
abcnishant007 commented 2 years ago

@henrymartin1 /@hongy Working solution is live now @ https://abcnishant007.github.io/trackintel/ last 20 merges into the master and their corresponding performance improvement using around 10MB of geolife data. Excellent improvements noted!

I chose this function based on @bifbof 's comment on the pull request that his improvements are targeted at generate-staypoints function. Over time, I will add more mem and time classes (on demand) for different functions. Creating new benchmark classes is almost a no-brainer, as I can simply copy and adapt from the unit tests for specific functions. The pytest.fixtures serve as the examples for setup functions inside asv benchmarks.

Pending items are: ~1. Finally moving this to mielab/trackintel.github.io ... instead of abcnishant007~

What does not seem feasible (at least based on my current findings):

abcnishant007 commented 2 years ago

While viewing the benchmarks panel, it is better to rescale the browser window so that different groups of tests (mem, time, and peakmem) are arranged into columns as shown below:

Screenshot 2022-07-06 at 18 13 57
abcnishant007 commented 2 years ago

This branch https://github.com/abcnishant007/trackintel/tree/benchmark-files contains the files used to benchmark as well as the html files for the hosted github pages.