Tagging tolerance - Githubissues

GoogleCodeExporter commented 9 years ago

I am concerned that the tagging tolerance will be too tight to function well
There could be three way of solving this;
1) allow the user to change the tolerance (needs input box)This will define the 
value to be used for all roads and tags in the project measurement area.
2) more complex - allow the user to run a GPS trace on the "most difficult" 
section (widest road or the one with most GPS "obstructions") and the software 
self adjusts the tolerance so that 99% of the points lie on the tagged section 
- and then usees this for the whole project measurement area.
3) ask the user to define the shortest block length in each area (residential, 
commercial, industrial) and use half of this value for all tagged roads in that 
area)

option (3) would be very clean, but i'd put up with (1)!

Original issue reported on code.google.com by jarogers...@gmail.com on 13 Oct 2010 at 4:26

GoogleCodeExporter commented 9 years ago

Possible solution: include tolerance per tag. Two implementation points:

1) UI - add an attribute for tolerance on tag
2) In the stored procedure, include custom tolerance

Original comment by stuartmo...@gmail.com on 13 Oct 2010 at 6:43

GoogleCodeExporter commented 9 years ago

Original comment by stuartmo...@gmail.com on 18 Jun 2011 at 1:49

Changed state: Accepted
Added labels: Milestone-Release1.1

GoogleCodeExporter commented 9 years ago

Original comment by stuartmo...@gmail.com on 18 Jun 2011 at 1:49

GoogleCodeExporter commented 9 years ago

Original comment by stuartmo...@gmail.com on 18 Jun 2011 at 1:51

Added labels: Type-FeatureRequest
Removed labels: Type-Review

GoogleCodeExporter commented 9 years ago

Add (m) unit label for block lengths

Add new row on study region:
- GPS tagging tolerance (with uneditable textbox)
- if user changes block length value for the default zone type, then divide by 
2 and update tolerance UNLESS it is less than our best guess of tolerance (in 
the system, forget what it is, its in a stored procedure)
- change our system default to 50.0

Original comment by stuartmo...@gmail.com on 20 Jun 2011 at 3:25

Added labels: Priority-High
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

Original comment by stuartmo...@gmail.com on 20 Jun 2011 at 5:01

Added labels: Priority-Critical
Removed labels: Priority-High

GoogleCodeExporter commented 9 years ago

Copying a recently sent email since it has to do with this defect:

John,

It appears that the GPS logger data you sent me recently from an Accra dataset 
does NOT have the bearing (direction) of the point accurately recorded.

Here is a line from the original dataset that I used to develop the tagging 
algorithm:

$GPRMC,064606.454,A,0534.3602,N,00012.2575,W,0.04,63.41,140410,,*21

Here is a line from the most recent dataset that I am using to test the changes 
to tagging algorithm:

$GPRMC,105426.000,A,3857.746220,N,07705.963280,W,0.176,0.00,180611,,,A*48

Recall the schema ffor GPRMC:

      1   Time Stamp
      2   validity - A-ok, V-invalid
      3   current Latitude
      4   North/South
      5   current Longitude
      6   East/West
      7   Speed in knots
      8   True course
      9   Date Stamp
      10  Variation
      11  East/West
      12  checksum

Based on this, the first line gives us a bearing of 63.41 whereas the second 
line gives us bearing of 0.00

When bearing is 0, the "assignPoints" routine falls through to matching on the 
default zone only, because it can't find "the nearest road with the same 
bearing within the distance tolerance". This is because the bearing tolerance 
(at a default of 45 degrees) filters out all points that have a bearing of zero.

Could it be that a GPS logger is set to IGNORE bearing? If so, that is a little 
disastrous for us.

Original comment by stuartmo...@gmail.com on 21 Jun 2011 at 6:07

GoogleCodeExporter commented 9 years ago

Because of the spatial nature of the data, it is a little difficult to validate 
any changes to the tagging algorithm. The mini-reports included here 
demonstrate that indeed the tagging tolerance is having an effect. 

A description of the abbreviations in the mini-reports:

1) dz=ind -- default zone is #IND
2) <zonetype>/<meters> -- block length for zone type (e.g. com/200 means 
commercial block length is 200m)
3) tagtol/<meters> -- tagging tolerance in meters
4) W;X;Y;Z -- points tagged #IND, points not tagged #IND, number of tagged 
points, checksum
4) weird: X processed, Y tagged -- what is reported to the user based on the 
update status. 

Mini-reports:

-- dz=ind, com/400, ind/400, res/400, tagtol/200
-- 3450;4430;7880;7880
-- weird: 7880 processed; 9789 tagged

-- dz=ind, com/200, ind/200, res/200, tagtol/100
-- 7773;107;7880;7880
-- weird: 7880 processed; 13393 tagged

-- dz=ind, com/10, ind/10, res/10, tagtol/50
-- 7773;107;7880;7880
-- weird: 7880 processed; 13393 tagged

Notes:

1) As the tagtol drops from 200m to 100m, the UI reports an increase in matched 
point vs processed points. However, the database shows a significant decrease 
in the number of points matched in the default zone. This means more points are 
matched to roads when the tolerance is high -- which is good.

2) As the tagtol drops from 100m to 50m, the UI reports the same ratio of 
matched points vs processed points, AND the database shows the same ratio of 
points tagged to roads as points tagged to zones. 

Conclusions:

This simple test on a limited set of data shows:

1) the tagging tolerance is having an effect on the number of points matched to 
road tags or zone types
2) we have an error in the UI where it is reporting the number of matches vs 
process points

Original comment by stuartmo...@gmail.com on 21 Jun 2011 at 6:29

GoogleCodeExporter commented 9 years ago

Stuart
In the assign process: are you picking a GPS coordinate and looking
for roads that are within the tolerance, or are you running along
roads looking for GPS coordinates?

Original comment by jarogers...@gmail.com on 21 Jun 2011 at 6:43

GoogleCodeExporter commented 9 years ago

Copying email here, since it answers your question in Comment #9, and in 
general about the complexity of this piece of code:

John,

1) TAMT is not the kind of software that you can get away with using without a 
manual. Its just too complicated, and as much as we have tried to make it 
intuitive, there are areas of it that are obtuse unless you know the process 
from reading the manual.

2) The code for spatially filtering based on distance and bearing is very 
tightly coupled. For each point in a GPS trace, we are doing this:

        SELECT
            r.tag_id,
            r.id,
            ST_Distance(
                ST_GeographyFromText(AsText(p.geometry)), 
                ST_GeographyFromText(AsText(r.geometry))) as distanceMeters,
            TAMT_compareBearing(p.geometry, p.bearing, r.geometry, bearing_tolerance) as similarbearing
        FROM roaddetails as r
        WHERE
            (ST_DWithin(
                ST_GeographyFromText(AsText(p.geometry)), 
                ST_GeographyFromText(AsText(r.geometry)), 
                distance_tolerance))
        AND 
            r.region = studyRegionId
        ORDER BY similarbearing DESC, distanceMeters ASC LIMIT 1

First: Notice that the road proximity test is in the WHERE clause. This is a 
straightforward GIS comparison, but may return multiple roads. Which is why the 
bearing is so critical because we only want to match the nearest road that has 
the most similar bearing.  

Second: Notice that the TAMT_compareBearing algorithm is in the SELECT 
statement (so we can feed in the point and road geometry and attributes, as 
well as the bearing tolerance) so that we get back a TRUE or FALSE value. 

Third: This is where the magic happens... In "Magic, Part 1" the ORDER BY 
clause will sort the matching road(s) by ascending nearness, but that sort 
order is trumped by the similarbearing value being sorted in descending fashion 
(this always puts a near road with TRUE similar bearing above a near road with 
FALSE similar bearing; especially in the case where the roads intersect and the 
point sits right on top of both of them). In "Magic, Part 2" we LIMIT the 
results to only 1 row, thereby always picking the "nearest road with the most 
similar bearing within the tagging tolerance". (There are are couple of other 
routines we do in the case that no roads are near enough or no bearings are 
matched, but the real glue is what you see here.)

Having said that:

I don't feel confident in changing how this stored procedure works and getting 
those changes done in the time alloted. This one procedure took more than a few 
days to get to this stage. Redesigning it now is very risky. Adding the extra 
logic you suggest would easily fill up a day or two of design plus several days 
of coding and basic developer testing.

Secondly, because of the number of points we are acting on (sidebar: in answer 
to your comment in the Issue tracker; yes, we process each point and look for 
near roads; no, we do not travel along a road and look for points), there would 
a performance hit to break up this query into two or more in order to 
accomplish the logic you are suggesting. 

Like most of the issues this week, it is a tradeoff between a) proper 
documentation / data gathering preparation and b) new code (with new designs, 
and new tests). 

It comes down to this: If you want to solve this issue with a change in the 
design of the stored procedure, then we risk not fixing a number of other 
issues. There is also a moderate chance of getting bit by some unknowns in 
trying to make this change, and not finishing the fix at all in the alloted 
time.

Original comment by stuartmo...@gmail.com on 21 Jun 2011 at 7:08

GoogleCodeExporter commented 9 years ago

Stuart
OK sounds good and I agree.
1) When you calculate similar bearing are you including the reverse bearing too?
ie if the bearing is 90 degrees then the acceptable range will be
45-135 and 225-315 since we are not distinguishing between N-S traffic
flow and S-N traffic flow.
2) When you are importing the GPS datalog we are reading the GPS
points sequentially. If the logger file does not include heading,
would it be possible to write it in here as being the heading from the
previous point?

Original comment by jarogers...@gmail.com on 21 Jun 2011 at 7:32

GoogleCodeExporter commented 9 years ago

John,

1) Yes, we already account for reverse bearing in the TAMT_compareBearing 
stored procedure.

2) At first blush, a "last good heading" fix sounds like a viable choice. 
However, there are several problems with this. A) the dataset I am testing has 
0s for every point, including the first. So, for loggers that do not capture 
any headings, they would be all the same as the one before, therefore they 
would all be 0. B) Relying on the last good heading gets into all sorts of 
complexities spatially -- there are a number of potential "corner-cases" that 
may occur in areas with twisty-turvy streets with poor bad loggers that 
intermittently capture the heading.  While (B) may happen, (A) is more likely, 
especially if the heading is not set to be captured at all.

As for your statement that you agree:

Are you agreeing that we solve this issue with documentation, and NOT with a 
code change?

Original comment by stuartmo...@gmail.com on 21 Jun 2011 at 7:57

GoogleCodeExporter commented 9 years ago

Stuart
I was agreeing that changing your magic routines could cause delay. I
had already agreed to putting something in the manual.
Re previous GPS point defining direction: I was not thinking of using
the heading contained in the previous NMEA sentence. I was thinking
along the lines of using the GIS capabilities of Postgre to define the
heading from the coordinates of the previous second to the coordinates
of the current second-- and only if the heading field is empty or
zero.

Original comment by jarogers...@gmail.com on 21 Jun 2011 at 8:07

GoogleCodeExporter commented 9 years ago

OK, we are agreed on documentation, but we are still discussing a possible code 
fix here, which is fine. However, we are still not out of the woods.

Let's look at two corner cases: 

1) GPS logger does not collect any bearings. All bearings are recorded as 0.00. 
That means for every point we look "backwards" to get the coordinates for the 
last point, then as the ST_Azimuth function in postgis to give us the bearing. 
Repeat for every point. Moderate to heavy performance hit, depending on number 
of points and likelihood of logger to not record bearing.

2) GPS record records bearing as 0.00 because it really is pointing true north. 
Much less likely than (1), but may happen often depending on locale (like Salt 
Lake City, with a north-south grid system). Mild to low performance hit, 
depending on spatial layout of roads.

I think I can try to fit your solution in: I am walking through all the points 
anyways, so it won't be hard to remember the last point. And we use the azimuth 
function to compare bearings already, so I don't think the additional call will 
slow things down too too much. We'll see a slight increase in time, most 
noticeably with large GPS logs.

Original comment by stuartmo...@gmail.com on 21 Jun 2011 at 8:33

GoogleCodeExporter commented 9 years ago

Thanks Stuart
This is going to be sooooooooooooooo user friendly :-)

Original comment by jarogers...@gmail.com on 21 Jun 2011 at 8:46

GoogleCodeExporter commented 9 years ago

John,

I have spent too much time already in trying to accommodate for a user not 
having bearings turned on during data collection. I am getting mixed and 
hard-to-understand results (original bearings and derived bearings are 
sometimes within reason, sometimes not, and almost never equal) that have 
contributed to being bogged down on this one.

As it stands, the original feature request has been implemented. By this I mean 
that tagging tolerance is now user-configurable in the user-interface, and is 
used instead of a hardwired value during the tolerance algorithm. This is as 
described in the original report (with the additional constraint of the 
tolerance never being less than 50m.)

This is just the kind of time-sink that I was afraid of and tried to articulate 
this at the beginning of this session. I am moving on with the last outstanding 
critical issue (Issue #67) and hope to be able to return to this.

I suggest that you create a new defect with the details for a "no-bearings" 
data collection scenario, and we mark this issue (ie, the tagging tolerance 
feature) as fixed.

Stuart

Original comment by stuartmo...@gmail.com on 22 Jun 2011 at 6:17

GoogleCodeExporter commented 9 years ago

Fixed in Revision 0e78bb92647f

Original comment by stuartmo...@gmail.com on 22 Jun 2011 at 7:55

GoogleCodeExporter commented 9 years ago

Original comment by stuartmo...@gmail.com on 22 Jun 2011 at 7:56

Changed state: Fixed

hiten4github / tamt

Tagging tolerance #41