Performance measurement suite

kodonnell commented 7 years ago

As discussed in #67 and #68. Code isn't ready for a full merge - I'm looking for some initial feedback on the general approach. Key points:

created matching-tools module (like graphhopper-tools)
copied Measurement.java from GH, and added two tests: one for location lookups (i.e. findNClosest) and one for testing a bunch of randomly created GPX entries (randomly pick start/end, then find route, then down-sample points).
learned (some of) Maven, the hard way

Todo:

general tidy up
tweak random creation of GPX entries (above) to be near the path, not on it.
do I need to warm up the JVM?
...?

Example usage:

$ ./map-matching.sh action=measurement map-data/leipzig_germany.osm.pbf 3
...
$ cat measurement.20161005_145346 
commits (in order tested):
3b305dc8a76123a12375da91a69d8ecf47466ed0 merged most recent
cacb7a3e0d2033042ca11d7280af43501a2a06dc ok, seems to be working ...
59a01fcc3307a05e9b8e74119ec1948f8a9de7c9 getting multiple commits working ...

measurements:
-------------

location_index_match.max      5.978906            6.654648            6.393774            
location_index_match.mean     0.1895065804        0.194572196         0.1902064926        
location_index_match.min      0.001334            0.00126             0.001377            
location_index_match.sum      947.532902          972.86098           951.032463          
map_match.max                 8.049864            7.153459            7.29994             
map_match.mean                0.7924976776        0.7868942498        0.7843470472        
map_match.min                 0.094859            0.128563            0.091647            
map_match.sum                 3962.488388         3934.471249         3921.735236         
measurement.count             5000                5000                5000                
measurement.seed              123                 123                 123                 
measurement.time              7010                7007                7027                
measurement.totalMB           655                 655                 655                 
measurement.usedMB            19                  19                  21

karussell commented 7 years ago

Really nice - thanks!

learned (some of) Maven, the hard way

oh, what was the problem here? I do like gradle a bit more (see also an issue on GH core) but the work would be too big to make sense now.

do I need to warm up the JVM?

That should be done already from the MiniPerfTest

tweak random creation of GPX entries (above) to be near the path, not on it.

Yes, this sounds good. I think I've done this in the tests already somewhere.

kodonnell commented 7 years ago

oh, what was the problem here

I've just never had experience with Maven (or Java) before (I'm more python/JS/etc.) - so figuring out how to make it all work (including in Eclipse) took a bit of head-scratching. Thankfully, I could just copy from e.g. graphhopper-tools.

I think I've done this in the tests already somewhere.

Ah, didn't realise that. From what I see, there's no randomness added - both in which points are chosen (it seems one GPXEntry is created per instruction point) and the GPXEntry aligns exactly with the route. As an aside, I was thinking about adding in some new tests which create a random route and create some GPX entries, and then check that the map-matched route is the same - what do you think?

I will tidy this PR up now. For anyone else reading, some next steps (which I'll try to add as TODO's in the code):

tidy up map-matching.sh ... following the graphhopper equivalent, which is more robust.
add some more perf. tests to Measurement.java
figure out git a bit more nicely when measuring the last commits: currently you can only do it after you've just committed (as you can't check out another commit without commiting any existing changes), and it doesn't come back to where you started (I seem to end up in a detached head state).
similarly, when comparing the last three commits, it assumes you've got the same measurement properties - it won't merge things nicely if e.g. you add a new test (which isn't present in old commits).

karussell commented 7 years ago

including in Eclipse

I recommend to use NetBeans or IntelliJ instead (especially when you use maven). Eclipse is a no go for me personally ;)

figure out git a bit more nicely when measuring the last commits

For perf comparison you could make the parameters you want to change configurable instead. But for me it was often a lot easier to create a separate branch and test some config change here something there and commit these to the branch. The it doesn't matter where I ended up I could always type either git checkout master or git checkout some_branch afterwards.

From what I see, there's no randomness added

Uups - indeed. That is something we can/should do. I probably didn't added the randomness as the old algorithm was not good enough to figure out all situations.

I was thinking about adding in some new tests which create a random route and create some GPX entries, and then check that the map-matched route is the same - what do you think?

Yes, sounds good. I would distort the routing output a bit before the map matching and compare the original route length R (not the length G calculated from gpx entries) with the map matched length M and allow only <1% difference or something. If one would compare G and M the difference could be bigger.

kodonnell commented 7 years ago

parameters you want to change configurable

I was more thinking of changes to the code base (which you want to test before you commit). I actually just slipped up after running the test as it left me in a detached head state, which I then committed to.

Yes, sounds good

Submitted a new issue to remind us.

kodonnell commented 7 years ago

Is it OK to do this on my fork, to update my user details for this PR, given it hasn't been merged. I'm not sure if that'll mess with anything else (including the PR you just merged ...)

karussell commented 7 years ago

Yes, you should be able to change your user details here and push (force) without making a problem on my side here.

kodonnell commented 7 years ago

Excellent, seems to have worked. Want me to merge in the new API, or leave that to you?

karussell commented 7 years ago

Sorry, will have a look to your latest changes - thanks!

kodonnell commented 7 years ago

@karussell, is there anything I can do to move this along? I ask because I've anecdotally found odd results. E.g. one would expect deduping on node ID (see #81) to improve performance (by reducing the number of candidates), but I actually found it decreased it (in a specific case, and including when I'm caching the paths and using DIJKSTRA_ONE_TO_MANY). Similary, it'd be good to know how #83 affects performance, etc.

karussell commented 7 years ago

Sorry, will have again a look at this this week. Just too many things on the table.

using DIJKSTRA_ONE_TO_MANY

BTW & again: this is not recommended, please use a customized Dijkstra with many 'to' nodes instead :) !

kodonnell commented 7 years ago

BTW & again

I know = ) It seems to work, and I just haven't got round to replacing it ... but I will before production.

kodonnell commented 7 years ago

Updated to address some of the above concerns, and add in a new feature. Specifically, running it for e.g. the last 5 commits will create five separate files (named measurement<time at start of test>.properties as per the usual GH behaviour) and one simple merged file measurement_<first_commit>_<last_commit>.propertieswhich allows for easy comparison between commits, including graphs (!). That is:

measurement2016-12-04_19_06_37.properties
measurement2016-12-04_19_07_01.properties
measurement2016-12-04_19_07_25.properties
measurement2016-12-04_19_07_49.properties
measurement2016-12-04_19_08_14.properties
measurement_f83558d_758529c.properties

The (start) for the corresponding output (which is the same as the merged file):

commits:
--------------------------

f83558d [2016-12-04] start with oldest commit first
173bed8 [2016-12-04] playing with graphs
5441639 [2016-12-04] show charts one by one
2a990e1 [2016-12-04] show charts one by one
758529c [2016-12-04] Merge branch 'issue_68' of github.com:kodonnell/map-matching into issue_68

measurements:
-------------

                              f83558d             173bed8             5441639             2a990e1             758529c             
                              -------             -------             -------             -------             -------             
location_index_match.max      6.218712            6.221624            6.680946            6.379524            6.493001            
location_index_match.mean     0.190912689         0.1928028062        0.19456209100000002 0.19487444040000002 0.1991708052        
location_index_match.min      0.001219            0.001199            0.001186            0.001306            0.001221            
location_index_match.sum      954.563445          964.014031          972.810455          974.372202          995.854026          
map_match.max                 1518.55109          1655.122742         1527.293507         1656.642537         1640.558294         
map_match.mean                120.16302784999999  129.7930923         124.57563984000001  126.85586235000001  126.61892128        
map_match.min                 3.718201            3.810781            3.795458            3.840196            3.79461             
map_match.sum                 12016.302785        12979.30923         12457.563984        12685.586235        12661.892128        
measurement.count             5000                5000                5000                5000                5000                
measurement.seed              123                 123                 123                 123                 123                 
measurement.time              17178               18165               17749               17939               17943               
measurement.totalMB           1449                1528                1343                1302                1469                
measurement.usedMB            35                  37                  33                  33                  35                  

                          location_index_match.max                             

                                   ********                                    
      ********      *********      *      *      *********      ********       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *      *      *       *      *      *      *       *      *      *       
      *  +   *      *   +   *      *   +  *      *   +   *      *   +  *       
  +---********------*********------********------*********------********---+   
      f83558d        173bed8        5441639       2a990e1        758529c       

Press any key to view the next chart, or CTRL + C to exit ...

In future (this PR?), we could/should

[ ] add an error quantification (e.g. std) to the measurement class (inc. in GH), and add it to the outputs (inc. plots)
[ ] cache some of the builds/tests to speed things up if the developer's looking for quick feedback

Would you mind to make this working with this UI?

I guess it should work if you copy the files into the right place. (I'm assuming you don't want me to add a duplicate UI to the map-matching module? As and aside, that's another good reason for merging).

As discussed here ... maybe we have something like this for the individual developer use case, and leave the UI as a separate thing for the full history browsing use case?

kodonnell commented 7 years ago

PS - we can remove all the merging stuff and graphs, if you want. It's kind of a proof-of-concept, but it might be pointless, depending what happens with the GUI.

karussell commented 7 years ago

which allows for easy comparison between commits, including graphs (!).

Hmmh, you rely on gnuplot. I really like that you try for simple solutions, but this will definitely make problems on other OSes (and I say this although I love gnuplot and I have it always installed :) )

(I'm assuming you don't want me to add a duplicate UI to the map-matching module?

No. Duplication would be ugly. Maybe I'll keep it out of GH then completely

As discussed here ... maybe we have something like this for the individual developer use case

If we really want a separate thing here to get a fast overview directly after the run I would highly prefer a pure Java solution for the graphing/pictures and e.g. use jfreechart (which can be used to export images too). See e.g. the usage in the pt branch here

karussell commented 7 years ago

Totally forgot: thanks for creating the separate measurement properties, also the merged measurement file is useful. Now we need to decide regarding the plotting, if optional, with pure Java or not at all in this for now.

kodonnell commented 7 years ago

Hmmh, you rely on gnuplot

Kind of - it works without, and you get notified if you don't have it. If a developer wants graphs (they may not need then) and if they're like me (unix based and install things willy nilly) then it shouldn't be an issue. (It's also worth noting that the UI here relies on NPM etc.) However, this is more of a POC and I think (?) I could pretty easily write a bash-based solution, or we could use a different tool. (I've actually never used gnuplot before - I just chose it as it gives you ASCII charts.)

I would highly prefer a pure Java solution for the graphing/pictures

I agree. However, the only way I can see this working (which I think you're alluding to as well) is

we create Java runner to take care of e.g. switching commits, rebuilding, saving stats in memory, displaying, etc. I don't know enough Java to comment on how easy that'd be, but I think this is probably the 'best' solution.
or we leave the current method for generating the measurement* files, but use something else (Java or html/js) for viewing the graphs. I don't like this approach as it complicates the workflow etc.
stick with status quo, and wait and see if others want more than what we currently have.

Totally forgot: thanks

Thanks = ) I enjoy weird little puzzles like this.

karussell commented 7 years ago

It's also worth noting that the UI here relies on NPM etc.

NPM is something we already require for the web UI, also as you said we can load measurement files from disc directly or even reject the entire PR as not really helpful. What do you mean with 'etc'?

Kind of - it works without, and you get notified if you don't have it.

Would you still mind to remove it? Maybe you can create a separate script for your own usage?

Keep in mind: everything thing we add needs to be maintained (somehow from someone) and for that reason I would try to reduce complexity - at least for now. (step by step appraoch :))

stick with status quo, and wait and see if others want more than what we currently have.

Probably that is the best. Still we need to figure out how to improve step by step, with a clear vision of the final goal (historic overview vs. developer trial and also regarding Java plotter).

kodonnell commented 7 years ago

What do you mean with 'etc'?

Just my shorthand way for expanding the way you did above.

Would you still mind to remove it?

No worries at all - it was more a test that anything else. All removed.

everything thing we add needs to be maintained

Good point - I don't think I've fully appreciated that.

Probably that is the best

Agreed.

karussell commented 7 years ago

Would you squash into as few commits as necessary for you (in your opinion)?

kodonnell commented 7 years ago

btw: afterwards I'll start to merge map matching repo with the core repo

Great - hope it isn't too painful!

Would you squash into as few commits as necessary for you (in your opinion)?

I haven't done this before sorry, and on the merge button options it says that 'squash and merge' is not enabled for this repository? Any tips?

karussell commented 7 years ago

I haven't done this before sorry, and on the merge button options it says that 'squash and merge' is not enabled for this repository? Any tips?

You can do it either with your IDE or with the command line: git reset --soft HEAD~X where X is the number of commits or use git rebase -i HEAD~X if you want to e.g. concatenate the existing commit messages.

I've explicitly disabled squashing here for merging to avoid problems like authorship lost etc

karussell commented 7 years ago

Ah and afterwards you will have e.g. just one commit different on your branch issue_68, then do a git push -f which does a force push to the issue_68 branch. Note: you should never ever do a forced push if you share the branch with someone (will cause them trouble especially if they have unpushed commits)! But doing it here is good and afterwards we can merge it easily into master.

If that is too much for the beginning I can do it :)

kodonnell commented 7 years ago

If that is too much for the beginning I can do it :)

Yes please! It looked simple enough, except I've also got merges in there, and for now I'd appreciate if you dealt with those, if it makes sense. (If it were a private repo, I might play, but I don't want to muck this up for others.) I'm happy for you to squash it all into a single commit (if possible). Thanks!

karussell commented 7 years ago

Ah, yes. Then I do it via a workaround which I found somewhere:

# update master to latest origin/master
git checkout master
git pull
# now merge the branch into the master but reset afterwards
git merge somebranch
git reset origin/master

Now the merge is reverted but all changes are still in the working tree and we can commit all changes on master or if on the other branch then do:

# create a fresh branch with the same name
git branch -D somebranch
gco -b somebranch
# now commit the changes with the original author but your 'hash'
git commit -m "one commit" --author "Old Author Name <email@address.com>"
git push -f

karussell commented 7 years ago

Ok, decided to go the simple way to enable squashed merges temporary ... I hope this was okay?

kodonnell commented 7 years ago

Doesn't worry me - looks like I'm still down as contributing. Thanks @karussell.

graphhopper / map-matching

Performance measurement suite #73