Closed kodonnell closed 6 years ago
is this sensible behaviour for the user?
Yes, this makes sense.
should we return a single MapMatch result (with a single time/distance) - as it implemented in this first commit - or multiple MapMatch results (each within their own time/distance). I prefer the latter (with a reason why each break occurred).
I think that MatchResult
should contain the matched GPS position for each original GPS position, as discussed above. Then each such entry could also contain a field indicating if/why an HMM break occurred. I think this is easier to handle by the user than multiple map match results. The user might not even be interested in HMM breaks and getting a single MapMatch result is then easier to handle.
I think that MatchResult should contain the matched GPS position for each original GPS position, as discussed above. Then each such entry could also contain a field indicating if/why an HMM break occurred.
Ah, that's a good idea. So, to clarify, we basically want one MapMatch result which includes the final route, as well as all the original GPX entries. Each entry can be inspected to know:
I'd suggest we also add a more general status flag to the MapMatch result e.g. 'contains sequence breaks', so users can easily check for this. And maybe we can warn the user too. (Otherwise if it's used blindly, we may end up with lots of issues submitted asking why there is a 'break' in the matched route!)
Anyway, with this, we should be able to describe all situations. However, as you say, most users will probably only be interested in the final route.
@stefanholder - does that sound good to you?
@stefanholder - does that sound good to you?
Yes, sounds good.
I would put it like this: MatchResult
should contains the final route as well as a list of match entries.
Each match entry contains:
I would put it like this:
Great, sounds good. I'll look over your other PR first, and then have a play with all of the comments above.
You can ignore the code for now (it's still unfinished), but I'd like some feedback on progress thus far. Specifically:
MatchSequence
does most of what @stefanholder suggested above (though maybe not tidily yet). I provide a MatchResult.originalGPXMapping
which allows the user to map (by index) a given original GPX entry to the matched one - and this contains information like the matched edge, how far along, the snapped point, the to/from time, etc. Alternatively one can traverse the matches directly.
The main changes remaining are
an alternative option is to leave it as is ('ignored' GPX entries are ignored from the result), but allow the user to change the filtering to prevent e.g. any filtering at all. This is simpler, but will slow down the matching.
I would prefer this alternative because it's simpler and I think the user should have the option to disable the filtering.
a tidier (API and code) approach would be to add properties directly to the original GPXEntry
I think it's better to have a separate class (GPXMapping
) for the matched GPX entry.
when we have sequence breaks, should I fill the break with an 'empty' sequence break, to keep the times contiguous?
If there are no candidates for a GPX entry then GPXMapping.matchEntry
should be null (not sure if this is what you mean).
Thanks @stefanholder - I will follow your first two preferences.
If there are no candidates for a GPX entry then GPXMapping.matchEntry should be null (not sure if this is what you mean).
More if e.g. we have a sequence A->C then an HMM break so start a new sequence D->F. In this case, there's no (explicit) information about what happened between C and D - we could leave it to the user to interpret, or I could add an 'unknown'/'empty' sequence C->D which really only contains the information about the start/end points (including time) but e.g. the route between them will be 'unknown'. Does that make sense?
To keep PR from getting even bigger, I created #92 for allowing customisable filtering. Otherwise, this PR is stabilising somewhat. I've renamed a lot of the entities - the names were misleading, and were confusing me (let alone new developers). E.g. GPXExtension -> Candidate, TimeStep -> ViterbiMatchEntry. I've updated the list above with some more tasks remaining too.
should I fill the break with an 'empty' sequence break
For now I'll leave this out - it's non-essential, and isn't too hard to add later if it's desired.
More if e.g. we have a sequence A->C then an HMM break so start a new sequence D->F.
If an HMM break happens because there is no transition between C and D then we can set the HMM break status for the GPXMapping
of D to 'no transitions to this entry', which is quite simple to do.
If an HMM break happens because there is no transition between C and D then we can set the HMM break status for the GPXMapping of D to 'no transitions to this entry', which is quite simple to do.
That's already done. To rephrase another way - say time at C is 10am and time at D is 11am. If the user currently goes to find 'what was the state at 10:30am', they could get confused, as they won't find anything (due to the sequence break). If we added a 'between break' sequence C->D, then they'd find that at 10:30am it was in an between-break state. In other words, should we explicitly or implicitly identify the presence of a sequence break? Put another way, each sequence should be contiguous in time, but the sequence of sequences is not, and maybe it should be.
OK, finally got back to this. @karussell / @stefanholder, I think the map-matching logic/API is reasonably stable, and the only remaining work is the web stuff + doco + tests - which I'll get to if you both approve of the logic/API. Quick summary of some of the major changes (which should all be in code doco):
Removed the logging code as I didn't want to spend time refactoring that while I was developing. Can add it back in if desired.
I think that the logging code is desired, at least until we have ways to visualize candidates and matched paths. I strongly relied on the logging in PR #81 and PR #91 and I think that it will be useful for other map matching improvements. Maybe it's a good idea to put the logging code into separate methods to reduce the noise in the core code.
Seems that you need to merge master into your branch because PR #91 has been pushed into master in the meantime.
logging code
I'm with you - it can make debugging easier, but it can also make the core harder to develop (at least with bigger changes like this). As a side note, useful logging is often specific to the feature/bug worked on at the time, and isn't generic. I don't know what the best approach is - I'm happy to add it in, or leave it out, or refactor it into other methods (though I'm not sure how to do that).
need to merge master
Will do. I'd prefer to do the merge once I know this PR is likely to be included.
I'm happy to add it in, or leave it out, or refactor it into other methods (though I'm not sure how to do that).
I would then suggest that you add the logging in again and I will refactor this into separate methods in another PR. This should also make future refactoring of the map matching code easier.
Cool - logging added back in.
Sorry, I was waiting on you both but had forgotten to merge master. How does it look?
Is this still a WIP? I've not checked it closely yet, maybe @stefanholder did? The performance implications and also the renaming should be discussed (necessary? better names?)
I've also created this here and can fix if this is required https://github.com/graphhopper/graphhopper/issues/971
Is this still a WIP?
Yes. The actual map-matching logic is (nearly) ready for review, but there are still other tasks as per here. I'm happy to polish off the rest if there's some indication this PR is 'good'.
The performance implications ...
key | current master branch | current sequences branch |
---|---|---|
location_index_match.max | 6.660339 | 6.278257 |
location_index_match.mean | 0.2038874168 | 0.217992223999999 |
location_index_match.min | 0.001316 | 0.001381 |
location_index_match.sum | 1019.437084 | 1089.96112 |
map_match.max | 1554.185032 | 1651.807253 |
map_match.mean | 242.08705704 | 243.51960916 |
map_match.min | 3.57688 | 3.188165 |
map_match.sum | 24208.705704 | 24351.960916 |
measurement.count | 5000 | 5000 |
measurement.seed | 123 | 123 |
measurement.time | 32932 | 33115 |
measurement.totalMB | 1524 | 1460 |
measurement.usedMB | 42 | 35 |
Those are from the measurement class, but still not completely robust (e.g. might be off due to competing resource, so really should be run repeatedly etc.) but give a reasonable indication that nothing's really changed (which is as I'd expect).
the renaming should be discussed (necessary? better names?)
Very open to feedback on that. I've tried to use names that represent their purpose, to make it easier for others to understand the code base.
I've not checked it closely yet, maybe @stefanholder did?
No, I didn't look at it closely yet either.
What about the remaining non-checked items?
They're yet to be completed. Are you happy with the sequence-specific stuff of this PR (logic and API)? If so, I'll work on those additional items.
Are you happy with the sequence-specific stuff of this PR (logic and API)?
I'm happy with it. But since the API has changed a lot, I think it's important that @karussell also has a look.
Great. I'll review your other comments, and work on the remaining items. When that's done - yes, I think it'd be good to have @karussell review.
Wow, getting the GUI to handle multiple paths was much more painful than I thought. Anyway:
That's using this gpx. I also had to tweak locationIndexTree.findNClosest to make it not always return a result (within tiles but outside radius) - otherwise I couldn't break it into sequences. (See here.)
Yes!!
As mentioned, the GUI thus far is hacky (and not fully-functioning) - I almost think it'd be better to rewrite from scratch than to continue hacking around it. However, then I'd risk making breaking changes (i.e. features I don't understand) ... so I'm not sure.
The unit tests are probably more important - I discovered some bugs even just testing this.
We're starting to build our house, so this is becoming less of a priority for me. I might still do it, but my I shouldn't make promises I can't keep ...
Cool, thanks a lot :) !
We're starting to build our house, so this is becoming less of a priority for me
Sure :) !
Did these changes make into the latest code base? We're finding very few matches when inspecting GPXExensions of match result. We have hundreds of input GPX entries but GPXExtension contains in many cases only a single entry.
@INRIX-Trang-Nguyen not that I know of. I'm closing this as I have no intention of actively working on it.
Objective
If the viterbi sequence breaks (e.g. no candidates found for a given GPX entry, or no transitions between candidates) then instead of throwing an exception, split into a new sequence. As far as I'm aware there's currently no easy way for the user to do this manually, let alone easily.
Questions
Cc @karussell @stefanholder