Closed mmd-osm closed 4 years ago
Also, this user managed to block all gpx processing for 3 hours by uploading a 250MB trace as a zip file: https://www.openstreetmap.org/user/Arne%20Schwarck/traces/3402427
I sent a PN to the user asking them to stop further uploads until this has been resolved.
Your claim might be true if there was only one machine processing trace uploads but that hasn't been true for some time.
I also don't see what any of that has to do with this bug.
In fact this issue was resolved in 6c159b96734f81efc24f2c1410cd979b5c272819.
It was never actually a problem by the way as trace_name
is entirely controlled by us.
How many workers are currently processing jobs in parallel? There seems to be a pretty large backlog right now.
(Yes, this should be in another issue, really)
Three, but there were a bunch of large jobs uploaded by that user. That's the point of the queue though, to be able to handle spikes, so having occasional backlogs is it working as designed.
My points should be finished within the next 12 hours. Usually the server was a bit faster in the past and my files a bit smaller. It does take time to process 3 million points. Next time I will try and limit the uploads to 1 million points. I do see the server running and confirming my successful uploads so all should be back to usual within the next 24 hours. Until next mouth where another 20 million points are expected.
Maybe it isn't ideal that one user can basically take over the gpx import feature, so that everyone else needs to wait for +12 hours.
Can we lower the job priority for large imports, or if a user already has x pending jobs?
I don't need the files to be available immediately, so that does sound like a good idea. As long as the 20 million points are done processing within a month so that my queue us finished when I upload the next set of files. I am fine with that.
Equally are these trace really useful. I just looked at one and it has data spanning several weeks, although not continuous, but when it is recording it's doing like 8 points/second which is kind of overkill unless you're going ridiculously fast.
If you guys have some code that could help me reduce the points. That would really help. The 50cm accurate sensor is polling at 10 times a second. So for high curvatures roads and for high speeds it is useful. All points uploaded are less than 2m accuracy so if you are anyway only using straight lines the straight line data could be refunded to end and start point. This data is coming from the autonomous driving fleet of ArnePilot.
GPSBabel + Douglas Peucker come to mind.. Maybe ask on the OSM forum, there are some existing threads on the topic: https://forum.openstreetmap.org/viewtopic.php?id=7801
Yes, specifically the simplify filter.
Thanks. Will add the python implementation into my processing scripts. I was anyway wondering if this data couldn't be used to auto fit the ways? At the moment I am moving the points by hand but if the data is of such high quality why not have the ways snapped to the average points?
Because that is not what OSM is, but in any case how do we determine which traces are high enough quality, and which ways they correspond to?
It's nothing to do with this ticket anyway.
It took about seven hours but running gpsbabel with the simplify filter and an error limit of 1m reduced the 230Mb referenced in this ticket to 3Mb - that's a reduction from 1.6 million points to 22 thousand points. Command line was:
gpsbabel -r -i gpx -f gps-data.44284.gpx -x simplify,error=0.001k -o gpx -F out.gpx
Would this not be a good optimisation option to run on the server? I will see how long the python script takes on the same file and report the results.
That's like blocking the import queue for 7 hours instead of 3 hours per gpx archive, which doesn't seem exactly helpful to improve import times. I think you should really run this for yourself on your local machine.
Might it not be useful to do this on the whole dataset. If you are anyway just drawing straight lines between points. And the reduction of 76x the space required is quite a huge amount. But maybe your database has good compression and it might not bring much. Have you visually compared the two files to see that there is not a loss of resolution?
The actual issues in this ticket have long since been addressed in full.
Follow up #2131: GPX upload uses external scripts to decompress zip/bzip/gzip files. To be on the safe side, some more input sanitization is required here.
We also need to improve zip file handling in general here, so people can't kill the server by uploading funny zip bombs. https://github.com/openstreetmap/openstreetmap-website/blob/268a8cb06e0a4734b9cb226ecebcc8445be4a9de/app/models/trace.rb#L256-L268