Solve bugs distance receivers calculation

damianooldoni commented 3 years ago

This PR solves several bugs found in draft code of Stijn when trying to calculate distance among receivers in other rivers than the Schelde. The code now works both for rivers as polygons (e.g. Schelde) and rivers as lines. Tested on the next study areas:

Schelde/Nete/sea (BE), 2000x4000 (standard) and 4000x8000 resolution
Gudena (DK)
Mondego (PT)
Stour

I document here below the bugs and how I solved them:

Bugs

Boundary conditions

If some receivers are out of the river, i.e. multiple clumps (or patches) are detected, the binary mask (0, 1 values) has to be extended. This is done in help function extend_patches() by extending the patch with the receiver until the patch is connected to the main patch. While doing this we have to take care of the boundary condition: we cannot change pixel values with row/col index < 1 or row/col index > #rows/#cols.

Bug reported in #62 and solved in 2d4dc1ed128219a615ab1f46da45fdbcceb60a85

Max() calculation with NAs

When making a raster out of the river spatial object, the NAs are not converted to 0 although this is done correctly while making the raster with receivers. This can be risky as we need to "merge" the binary mask raster by using max() function. max with NAs returns NA. An other alternative could be to use argument na.rm = TRUE in max(), but this is less elegant and is not symmetric.

Solved in dbba687a6d63b6559b464b2a8e0037167c7ebe72

Use receivers to assess raster extent

If the raster is only based on the river spatial object, it can happen that some receivers fall out of it. When receivers are added to raster in function adapt.binarymask() by:

# add locations itself to raster as well:
    locs2ras <- rasterize(receivers, binary.mask, 1.)

no information is returned about receivers falling out of the raster extent. This happens for example with river gudena using standard resolution nrow = 2000 and ncol = 4000. See map screenshot here below.

Solved in f3e7f5e521bfbe665213baa02416ce720637fc90

Avoid transformation of one-row matrix to vector

This behavior of R is quite tricky. When a matrix has one row and you try to do some operation on it, the output is transformed silently to a vector. The use of nrow() would generate an error as nrow() cannot be applied to vectors.

This can be solved by forcing R to not dropping to basic class, by using drop = FALSE. Example

a <- matrix(data = c(1,2,3,4), nrow = 1)
> a
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
> a[sort.list(a[,1]),] # sorting on values of first column returns a vector
[1] 1 2 3 4
> a[sort.list(a[,1]), , drop = FALSE] # using drop = FALSE output is still a matrix
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4

Solved in de2ddde0db0184835cfa20a51b6856d825470278

Adapt condition for stopping iteration

The extension of the binary mask to include all receivers should stop when all pixels with value 1 are conencted, i.e. there is only one clump, also called patch.

Solved in afb87d5010d81791ee97f1328a3692f6d437c92a

Improvements

These are not bugs, but improvements of the code.

Improved notifications

While extending the binary mask to include receivers, a message is returned while iterating the code to let the user know how many patches are still present. The extension stops when one patch only is detected. Example here below:

> study.area.binary.extended <- adapt.binarymask(study.area.binary, locations.receivers)
Initial number of patches: 5
Number of patches: 4
Number of patches: 4
Number of patches: 3
Number of patches: 1
Done: all receivers included

Solved in 3ae43b40aa7b2e1ef26a70ab7fa99b712ed63c3a

Use message instead of print

In general use of message() is way better than print() to communicate information to users.

Solved in same commit as before

Returns receiver distances as tibble data.frame

I changed the class of the output receiver distances: instead of a matrix, now a (tibble) data.frame is returned. In this way it is much more readable when printed on screen.

Solved in bf1c74dee00367f14b8eb76d4afc00b2bb4a01be

Show station names as receivers labels in maps

It's handy to have station names as labels of the receivers in the leaflet map instead of progressive numbers as the station names are used as row and column names of the distance data.frame.

Solved in 07ee9c0bf5b6e8308b320c8c57ddccab44901989

PieterjanVerhelst commented 3 years ago

I tried a thorough review by calculating distances between stations for various projects and checking the distances with those obtained manually in QGIS. The code works excellent in this first step 👍 . Here a few questions and suggestions:

the data frames/distance matrices only have station names in the column headers. Station names in the 'row headers' (= first column) are also useful.
In various distance calculations, the distance is sometimes quite off. For instance, in 'Gudena' the distance between UDBY1 and KATNORD is 4500 km, while I get 3500 km when calculating the distance manually in QGIS. Another example for the Frome: distance between F7 and F8 is 2567 km, while I get 3250 km manually. In the project EMMN various distances between the 2nd and 3rd receiver array are 2 km off compared to manual calculation. I noted that working on a higher resolution (e.g. 4000 x 8000) helps, but the distances are still a bit off. However, if it is less than 300 m (= receiver measurement error), it is not a problem.

For the project 2015 PHD VERHELST EEL I tried to combine two shapefiles, one are lines (Vhag.shp) and the other is a polygon (ws_bpns.shp). However, when running gUnion() this gives an error (line 33 - 45):

> study.area <- gUnion(vhag, ws_bpns)
ws_bpns is invalid
Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td, unaryUnion_if_byid_false,  : 
TopologyException: Input geom 1 is invalid: Self-intersection at or near point 567066.46833233652 5694796.7879214734 at 567066.46833233652 5694796.7879214734
In addition: Warning messages:
1: In RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid") :
Self-intersection at or near point 567066.46833233652 5694796.7879214734
2: In gUnion(vhag, ws_bpns) :
Invalid objects found; consider using set_RGEOS_CheckValidity(2L)

Note that this is to calculate distances between stations from the Belgian Lifewatch network. I now use other shape files than before, because the 'new' ones have a better fit to the actual rivers and contain more tributaries.

damianooldoni commented 3 years ago

Thanks for your feedback @PieterjanVerhelst, I will give a look.

damianooldoni commented 3 years ago

I found a major problem and a minor one.

Major issue

The extension to include receivers was written thinking that all water bodies were connected. This is not always the case, e.g. river Frome has two distinct not connected water bodies. So, the extension function, adapt.binarymask(), made the river wider and wider until the river bodies touched themselves.In this way the shortest distance between two receivers was very underestimated (= you could cut the curves).

I could find this by looking at the saved binary mask tiff image: the river was way too wide, independent of the initial set resolution.

Minor issue

I found always not really precise to set number of rows and columns of the raster instead of setting the pixel resolution. By setting the pixel resolution, the precision of the calculation is not anymore dependent on the size of the water body. Also, there is no distortion anymore because the number of rows and cols depends on resolution and the extension of the river body (xmin xmax, ymin ymax). I set the pixel size (resolution) equal to 20 meter, you can change it if you want.

@PieterjanVerhelst: can you give a look again and let me know? Thanks.

PieterjanVerhelst commented 3 years ago

Great thanks 👍 ! And indeed, much better approach to use pixel sizes instead of column and row numbers. I will continue testing this week and provide you feedback on Monday the latest.

damianooldoni commented 3 years ago

Thanks. I tested again with Frome and the distance between F7 and F8 is now ~3300m as expected.

PieterjanVerhelst commented 3 years ago

@damianooldoni a quick question: I am testing the distances for the River Gudena. Most of them are correct, but the distance between the most downstream stations UDBY1 and KATNORD are quite off. In QGIS I get 3800 m, while the distance matrix gives 4800 m. However, I calculated the distance in QGIS along the river center line. When I include the distance from the station to the river center line, I get 4500 m. So could it be that the calculation includes the station-distance to the river center line? Obviously, this is mostly negligible in rivers because they are not that wide and the distance of a station to the river center line is very small. However, in estuaries and fjords like with the Gudena, this can be a few hundred meters or even kilometers.

Also, the distances between the stations in the EMMN project are about 1000 m off. These distances are calculated within a polygon shapefile. Not sure why they are 1000 m off, but could be related to how the script handels polygons?

damianooldoni commented 3 years ago

Yes, indeed. The function measures exactly the distance between receivers. For this reason, the function adapt.binarymask() is needed: to be sure we have them included in the binary raster (1s against 0s background) to the river while calculating the shortest distance. Again, you have to see the binary mask saved as tiff to understand this. Here below a screenshot related to Gudena river and receiver KATNORD (upper right). Do you see the white squares? This white squares are added to connect the stations to the nearest point belonging to the waterbody.

Question

So, the question is: do you want the possibility to calculate the distance among the receiver projections on the line/polygon of the waterbody instead of the real receiver distance as done now? In other words, based on drawing below: do you want to have the possibility to measure the distance between the white points instead of the red ones?

proejction_receivers

PieterjanVerhelst commented 3 years ago

So in brief: I don't need the distance between the receiver and the blue line (= the projection). The real distance as calculated now is what I want 😉 . However, I still don't get why there is a difference of 1000 m between the calculations in R and those in QGIS? Why is the square of KATNORD bigger than the others in the binary.tiff?

damianooldoni commented 3 years ago

So in brief: I don't need the distance between the receiver and the blue line (= the projection).

To be 100% sure: should I "remove the white line" while measuring the distances? Or saying it in another way, should I thinki he receivers as they would be ON the riverbody line?

However, I still don't get why there is a difference of 1000 m between the calculations in R and those in QGIS?

Based on your https://github.com/inbo/fish-tracking/pull/69#issuecomment-824772882 the difference as I calculate the stuff now is not 1000m but "just" 300m.

Why is the square of KATNORD bigger than the others in the binary.tiff?

The square size depends on the distance of the receiver from the blue line. KATNORD is further than UDBY2, just to speak about the screenshots I used. Further the receiver is, bigger the square has to be.

PieterjanVerhelst commented 3 years ago

Aha, that makes more sense 👍 . I got confused with what you mean with the real receiver distances. I would need the distance between the white dots in your print screen (= distances between the receiver projections). And btw, the resolution based on pixels is much better 👍 !

damianooldoni commented 3 years ago

Ok, I need then some more time to implement it. Unfortunately next weeks will be very busy. I hope to find a little time first half of May. I will still allow users to choose between the two kind of distances.

PieterjanVerhelst commented 3 years ago

No problem and thanks 👍 !

damianooldoni commented 3 years ago

@PieterjanVerhelst : I think I succeeded to use the projections of receivers on the line/polygons of the river body in the distance calculation. The projections are called projections.locations.receivers. The projection is almost in all cases very precise. This means that extending the binary mask is often not needed anymore as the receiver projections are really ON the river body. The projection points are found using function dist2Line from package geosphere. I checked with some rivers, a.o. the Gudena river and I am happy to state that the distance among UDBY2 and KATNORD receiver stations is now 3784.5 m using 50m pixel resolution. Could you please check this again?

Below a zoomed screenshot from the new mapView with both receivers (red) and their projections (white) on the river body shapefile

PieterjanVerhelst commented 3 years ago

Almost there :smile:. The functionality works perfect for line-shapefiles and runs also much faster than previous version 👍 . However, with polygons the receiver locations are projected somewhere on the edge of the polygon. One example is the EMMN project in Norway. Apart from solving this discrepancy between lines and polygons, in some cases there is also a combination between both, such as for the Belgian and Dutch waterways (see #70 ).

damianooldoni commented 3 years ago

Well, this is a "discrepancy" we can live with, don't we? :smile: Actually I don't see it as a problem to be solved by the functionality, but it's a uniformity problem among the river body (input) data. Solving #70 is something I will check after this PR is merged and closed.

PieterjanVerhelst commented 3 years ago

I'm confused :sweat_smile:. So you will make a discrepancy in input data (lines vs polygons) in this PR?

damianooldoni commented 3 years ago

ahah :rofl: No, I just mean that there is difference (discrepancy) between the set of inputs (some of them are lines shpes, some are polygons) you use. The function I use for finding the white points (projections) calculates the nearest point of the river body geometry to the red points (receivers). So, it's normal that in case of lines it's on the line itself, while in polygons it's on the boundary of the polygon. I would not pay too much attention to this "discrepancy", as it doesn't make really a difference.

The only thing I think could improve my code a little more is adding the following condition:

IF receiver is OUT of the river body, then use projection (white point)
IF receiver is IN the river body, then use the exact receiver location

PieterjanVerhelst commented 3 years ago

Aah of course :grin:! That makes perfect sense! Just to be sure I understand your suggestion: when using line shapefiles and the receiver is not on the line, the projection will be used (which is now the case). When using polygon shapefiles and the receiver is in the polygon, the actual receiver position will be used.

Here just a print screen from the EMMN project in a fjord in Norway, where you see that the receiver positions of the arrays are projected on the edge of the polygon:

damianooldoni commented 3 years ago

Indeed, this is the only thing I would like to improve still in this PR. Use the projections ONLY if the receiver is out of the river body shapefile :+1:

damianooldoni commented 3 years ago

Ok, I think the projection is now working fine. the points projections.locations.receivers will be equal to locations.receivers if the latter ones are IN the water body. See screenshot below from emmn situation. the points are pink as the projections (white) are superimposed on the receivers (red) resulting in pink.

The function find.projections.receivers() will also return for each receiver whether projection is needed or not. Example from emmn:

> projections.locations.receivers <- find.projections.receivers(
    shape.study.area = study.area,
    receivers = locations.receivers,
    projection = coordinate.string
)
Receiver station 59 is in the water body. No projection needed
Receiver station 45 is in the water body. No projection needed
Receiver station 44 is in the water body. No projection needed
Receiver station 43 is in the water body. No projection needed
Receiver station 28 is in the water body. No projection needed
Receiver station 29 is in the water body. No projection needed
Receiver station 26 is in the water body. No projection needed
Receiver station 30 is in the water body. No projection needed
Receiver station 21 is in the water body. No projection needed
Receiver station 42 is in the water body. No projection needed
Receiver station 31 is in the water body. No projection needed
Receiver station 9 is in the water body. No projection needed
Receiver station 8 is in the water body. No projection needed
Receiver station 10 is in the water body. No projection needed
Receiver station 38 is in the water body. No projection needed
Receiver station 36 is in the water body. No projection needed
Receiver station 11 is in the water body. No projection needed
Receiver station 12 is in the water body. No projection needed
Receiver station 40 is in the water body. No projection needed
Receiver station 7 is in the water body. No projection needed
Receiver station 41 is in the water body. No projection needed
Receiver station 13 is in the water body. No projection needed
Receiver station 39 is in the water body. No projection needed
Receiver station 5 is in the water body. No projection needed
Receiver station 27 is in the water body. No projection needed
Receiver station 25 is in the water body. No projection needed
Receiver station 24 is in the water body. No projection needed
Receiver station 3 is in the water body. No projection needed
Receiver station 14 is in the water body. No projection needed
Receiver station 15 is in the water body. No projection needed
Receiver station 35 is in the water body. No projection needed
Receiver station 23 is in the water body. No projection needed
Receiver station 17 is in the water body. No projection needed
Receiver station 18 is in the water body. No projection needed
Receiver station 20 is in the water body. No projection needed
Receiver station rel_emmn1 is in the water body. No projection needed
Receiver station rel_emmn2 is in the water body. No projection needed

From mondego river, the situation will be the opposite: all receivers are out of the river body which is a line.

Receiver station R1 is not in the water body and will be projected on it.
Receiver station R2 is not in the water body and will be projected on it.
Receiver station R3 is not in the water body and will be projected on it.
Receiver station R7 is not in the water body and will be projected on it.
Receiver station R8 is not in the water body and will be projected on it.
Receiver station R4 is not in the water body and will be projected on it.
Receiver station R5 is not in the water body and will be projected on it.
Receiver station R6 is not in the water body and will be projected on it.
Receiver station rel_mondego is not in the water body and will be projected on it.

If there is no station_name, the message on screen will visualize the row number. Some checks have been also introduced to make the function more resilient.

Please have a look again and I really hope everything works as we wish.

PieterjanVerhelst commented 3 years ago

Very elegant! That works perfectly! I just checked the code on line shapefiles and one projection in the River Stour is a bit off (stour_19). I don't understand why the receiver is not projected on the line.

damianooldoni commented 3 years ago

Yes, I don't know why even. I found it some days ago as well, but it seems happening to only one (maybe two?) points. Check more cases and record such anomalies. If they remain very limited as I think, we can merge this PR and move further. Fingers crossed.

PieterjanVerhelst commented 3 years ago

I found a few more in the River Frome:

PieterjanVerhelst commented 3 years ago

However, the detection range of the receivers (= measurement error) is larger than this calculation bias. The wrong projections are on a very small scale (tens of meters).

PieterjanVerhelst commented 3 years ago

Just checked some more: I get the impression this mainly happens when there are multiple receivers close to each other. As already mentioned, the distance 'error' due to the projection is much smaller than the detection range of the stations. Therefore, imo we can merge the branch. Consider it a job very well done 👍 😄 !

damianooldoni commented 3 years ago

Thanks @PieterjanVerhelst for your comments and reviews. I will merge this PR and close it. We made a sensible improvement since we started this months ago.

inbo / fish-tracking