Closed abcnishant007 closed 2 years ago
A direct application of mprof on an end-to-end demo script for trackintel gives the plot as shown below:
Now, how do we incorporate this as a feature in trackintel? @hong2223 Should we have this option in one of the functions such as "log_RAM_usage=True/False"?
@hong2223 may I request for this to be opened?
Sure! We closed this and added to the roadmap as we thought we will not work on this in the near future. Would you start working on this?
Yes, I was exploring the asv based continuous integration last weekend and this seemed doable, hence the previous comment. You can open now or we can wait until I have pushed more commits to ensure that it works, then we can open the issue as well as the pull request :) Whatever you suggest.
This is what I have so far. It appears to be working fine apart from some issues which I will summarise as bullet points in the next comment. Attached are some screenshots for reference. We can discuss them in one of our meetings.
=================================== Speed Benchmark List function wise:
==================================== Speed Benchmark with plots function wise: X axis: Commits Y axis: Performance
==================================== Zoomed image of the above, with a pull request is selected showing significant reduction in speed. This is where we changed the for loop to groupby. (da94120)
pytest
to less than 4.0; Need to investigate how it can be run with a more recent version of pytest
time_
prefix such as time_test_trip_wo_geom
; @abcnishant007 What was the status of this issue? Reading through it it looks like it was almost finished. @bifbof is working on some nice performance increases, it would be great if we could track them.
This commit contains some snippets that came handy in making this work. For the status of this issue, please refer to this comment below instead.
git log | grep -B 1 "Merge" | grep "commit" | sed 's/commit //g' | cut -c1-7 | head -n 10 > commits.txt
; Better to run this on the master, so that we don't have some unncessary merges filteredasv run HASHFILE:commits.txt
asv publish
asv gh-pages
to automatically publish the results to the gh-pages branch."@henrymartin1 /@hongy Working solution is live now @ https://abcnishant007.github.io/trackintel/ last 20 merges into the master and their corresponding performance improvement using around 10MB of geolife data. Excellent improvements noted!
I chose this function based on @bifbof 's comment on the pull request that his improvements are targeted at generate-staypoints
function. Over time, I will add more mem
and time
classes (on demand) for different functions. Creating new benchmark classes is almost a no-brainer, as I can simply copy and adapt from the unit tests for specific functions. The pytest.fixtures
serve as the examples for setup functions
inside asv benchmarks.
Pending items are: ~1. Finally moving this to mielab/trackintel.github.io ... instead of abcnishant007~
What does not seem feasible (at least based on my current findings):
asv
, but it is not complete. asv environment
takes care not to re-run on the (commit, benchmark)
combinations which have already been benchmarked. While viewing the benchmarks panel, it is better to rescale the browser window so that different groups of tests (mem
, time
, and peakmem
) are arranged into columns as shown below:
This branch https://github.com/abcnishant007/trackintel/tree/benchmark-files contains the files used to benchmark as well as the html files for the hosted github pages.
Based on our discussion, I think it is better to have a memory profiler for trackintel. It will help us ensure that new pull requests do not significantly worsen the memory requirements. Such a profiler will become more relevant as we introduce multithreaded options for trackintel in general.