tkrajina / gpxpy

gpx-py is a python GPX parser. GPX (GPS eXchange Format) is an XML based file format for GPS tracks.
Apache License 2.0
1.02k stars 223 forks source link

Max speed is wrong due to removing arbitrarily 5% of quickest speeds #137

Closed cedeber closed 2 years ago

cedeber commented 6 years ago

Hello.

As @demlak mentioned in issue #126 the max_speed result is falsy. Not that the math is wrong but you are removing 5% of the quickest speeds for no real reasons. https://github.com/tkrajina/gpxpy/blob/1bfc131a34d6e0c67192c97a88252c319dac67b5/gpxpy/geo.py#L124

The fact is if it works while hiking; it doesn't work for quickest activities like mountain biking. In deep forest we may drive as quick as 50km/h and more but only during a few meters, which means the gps device will only take 2 to 3 shots, nothing more. And they are not wrong.

As you can see here https://www.komoot.com/tour/24114003 (I am not from komoot) at the end of the trail the speed graph can't be flatten.

What I could understand is removing 5% of the quickest speed because of accuracy, so max_speed = max_speed * 0.95, but not considering that only the 5% of the quickest takes are wrong.

What I suggest here is to remove your 5% removal and let developers do what they want with the values. I don't think it's the job of a parser to take accuracy into account. Because it depends on tons of things like the environment, the device, the activity, etc...

Thanks

tkrajina commented 6 years ago

Well, the idea in #126 was that "Max Speed in gpxinfo is not maximum speed, but average speed" and that's not the case.

So, let's return to the idea that max speed is not correct. It is not, I agree. Probably not as falsy as trying to figure out the max speed from a graph published somewhere on the internet. The problem, of course, is that there is no single best answer. The 5% algorithm is implemented after many tests from car/mtb/hiking tracks, but there is no math proof behind it.

What makes you think that one arbitrary method (removing the 5% speeds) is better than another (not removing the 5% speeds)? Not removing them is definitely simpler bot it has the same problem.

If "it's the job of a parser to take accuracy into account" then I should completely remove detection of max speed. But it's already here, and I don't plan to change the API.

To recapitulate: Removing the 5% won't happen. What can happen is to introduce an option in:

def calculate_max_speed(speeds_and_distances, raw=False):

and

    def get_moving_data(self, stopped_speed_threshold=None, raw_speed=False):

If the caller calls these functions with raw=True/raw_speed=True he will get the max speed, but this time without any manipulation. Be aware that the result will still be falsy.

cedeber commented 6 years ago

What makes you think that one arbitrary method (removing the 5% speeds) is better than another (not removing the 5% speeds)

Because by removing the 5% (the way you implement it) is considering that only the quickest speeds are wrong, which is, well, wrong. The accuracy is random and maybe sometimes the quickest takes are the more accurate, but it's not testable.

If "it's not the job of a parser to take accuracy into account" then I should completely remove detection of max speed

Of course not, but a parser should not remove or add any data. The max speed is the calculation between 2 adjacent takes with the longest distance, the shorter time.

What can happen is to introduce an option

It's a perfect and completely fine option for me 👍

Thanks again.

laodzu commented 4 years ago

So I stumbled over this problem also just the other day. All of my GPX data is manually cleaned to only contain correct data from my bike excursions. Looking for a solution to evaluate the tracks from the command line I stumbled over gpxpy (thanks!) and tested it through gpxinfo. From total distance, ascent and max speed only the total distance coincided with the summary from QMapShack[1], my primary GIS tools, for my tracks. Ascent varied significantly and max speed was completly off.

To my complete surprise I found this dropping 5% of perfectly legal data. To be honest the whole idea of dropping arbitrary data from the calculation gave me this "wait a second, what?" moment. Without any command line parameter to do any data cleaning I would consider dropping any data to be off limit for a summary tool. If I do have problems in my data I want to know about them rather than let the tool make assumptions on how to ignore them.

So if an option is required at all, I would vote for making "raw = True" to be the default, i.e. not drop data and only do cleaning when wanted by the caller ("heuristic-clean = True"). Again, just as @cedeber I think there is nothing in the "contract" of a GPX parser to do arbitrary "cleaning" on the data if not explicitly specified.

For completeness - the variance for the ascent results from different heuristics in the QMapShack implementation[2] - changes in elevation are only registered if they exceed a threshold of 5 meters. Obviously this leads to different results than the smoothing approach used in gpxpy.

[1] https://github.com/Maproom/qmapshack/wiki [2] https://github.com/Maproom/qmapshack/blob/dev/src/qmapshack/gis/trk/CGisItemTrk.cpp#L1110

tkrajina commented 4 years ago

To be honest the whole idea of dropping arbitrary data from the calculation gave me this "wait a second, what?" moment.

Yeah, I got that same "wait a second, what?" when I read that...

in (...) QMapShack (...) changes in elevation are only registered if they exceed a threshold of 5 meters

Obviously heuristics isn't something unexpected in various GPX tools.

Anyway, I'm perfectly OK with raw_speed=True. I'll probably do it myself when I find the time (which unfortunately means: not soon). If it's important to you -- feel free to send a pull request.

laodzu commented 4 years ago

I am sorry if you interpreted my comment in a negative way as I certainly did not intend it that way. I just wanted to emphasize that I did not expect such a thing to happen here. This may well be because of my incorrect assumptions. I did read and appreciate your explanation of why it is there but from my own experience with GPX data I can still not completly follow it.

Of course heuristics are to be expected in "tools", but I did not expect it in a "parser".

I also pondered about the heuristics in QMapShack but I think for that codebase the parser is tightly integrated into the tool and as such can assimilate functionality that is wanted in the tool.

However gpxpy advertises itself as an "GPX parser" and in this constellation my gut feeling would be to keep heuristics out of this area as it may well be used from tools with different heuristics on top.

Again, sorry if I could not make myself clear on the first attempt.

tkrajina commented 4 years ago

It's so strange that this issue never came to an end I hoped for: somebody to fix it, and send a pull request. It never materialised. Even after we agreed on a solution. Maybe it wasn't so crucial for somebody else to waste a couple of hours?

Anyway, I wasted my 3 hours of this Sunday afternoon, and here's the result: https://github.com/tkrajina/gpxpy/tree/raw-speeds

Test it, feedback welcome.

fpetry commented 3 years ago

Hi. Thank you @tkrajina for the project and also your efforts regarding the speed.

I'm not totally sure if it's exactly the original posters issue, but I also stumbled upon a max_speed problem:

I have a track with no speed information. gpxinfo gives: "Max speed: 31.25m/s = 112.50km/h" The raw_speeds branch gives the following: "Max speed: 31.25m/s = 112.50km/h (raw: 33.61m/s = 120.98km/h)"

Viking says: 143.26 km/h Nextcloud Maps says: 142.99 km/h Random online GPX viewer: 143 km/h

The file has only one trk and one trkseg.
As it was recorded on an Autobahn section where this is totally feasible and allowed I think the 143 km/h are correct. So the given max_speed from both gpxinfo versions is wrong, but I cannot understand why.

tkrajina commented 3 years ago

@fpetry if you can attach that GPX here (or send it to my email)?

PS. And thanks for testing this branch!

tkrajina commented 3 years ago

If somebody wants to test, there is another change pushed in https://github.com/tkrajina/gpxpy/tree/raw-speeds

tkrajina commented 2 years ago

Implemented in https://github.com/tkrajina/gpxpy/releases/tag/v1.5.0

laodzu commented 2 years ago

Hi Tomo,

thanks for adding the possibility to retrieve the raw speed data. Somewhat late to the game, I just checked and now I do see comparable output with this option to what my other tools report so I will start using it for my GPX collection.