Closed damianooldoni closed 3 years ago
I tried a thorough review by calculating distances between stations for various projects and checking the distances with those obtained manually in QGIS. The code works excellent in this first step π . Here a few questions and suggestions:
UDBY1
and KATNORD
is 4500 km, while I get 3500 km when calculating the distance manually in QGIS. Another example for the Frome
: distance between F7
and F8
is 2567 km, while I get 3250 km manually. In the project EMMN
various distances between the 2nd and 3rd receiver array are 2 km off compared to manual calculation.
I noted that working on a higher resolution (e.g. 4000 x 8000) helps, but the distances are still a bit off. However, if it is less than 300 m (= receiver measurement error), it is not a problem. 2015 PHD VERHELST EEL
I tried to combine two shapefiles, one are lines (Vhag.shp
) and the other is a polygon (ws_bpns.shp
). However, when running gUnion()
this gives an error (line 33 - 45):
> study.area <- gUnion(vhag, ws_bpns)
ws_bpns is invalid
Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td, unaryUnion_if_byid_false, :
TopologyException: Input geom 1 is invalid: Self-intersection at or near point 567066.46833233652 5694796.7879214734 at 567066.46833233652 5694796.7879214734
In addition: Warning messages:
1: In RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid") :
Self-intersection at or near point 567066.46833233652 5694796.7879214734
2: In gUnion(vhag, ws_bpns) :
Invalid objects found; consider using set_RGEOS_CheckValidity(2L)
Note that this is to calculate distances between stations from the Belgian Lifewatch network. I now use other shape files than before, because the 'new' ones have a better fit to the actual rivers and contain more tributaries.
Thanks for your feedback @PieterjanVerhelst, I will give a look.
I found a major problem and a minor one.
The extension to include receivers was written thinking that all water bodies were connected. This is not always the case, e.g. river Frome has two distinct not connected water bodies. So, the extension function, adapt.binarymask()
, made the river wider and wider until the river bodies touched themselves.In this way the shortest distance between two receivers was very underestimated (= you could cut the curves).
I could find this by looking at the saved binary mask tiff image: the river was way too wide, independent of the initial set resolution.
I found always not really precise to set number of rows and columns of the raster instead of setting the pixel resolution. By setting the pixel resolution, the precision of the calculation is not anymore dependent on the size of the water body. Also, there is no distortion anymore because the number of rows and cols depends on resolution and the extension of the river body (xmin xmax, ymin ymax). I set the pixel size (resolution) equal to 20 meter, you can change it if you want.
@PieterjanVerhelst: can you give a look again and let me know? Thanks.
Great thanks π ! And indeed, much better approach to use pixel sizes instead of column and row numbers. I will continue testing this week and provide you feedback on Monday the latest.
Thanks. I tested again with Frome and the distance between F7 and F8 is now ~3300m as expected.
@damianooldoni a quick question: I am testing the distances for the River Gudena. Most of them are correct, but the distance between the most downstream stations UDBY1
and KATNORD
are quite off. In QGIS I get 3800 m, while the distance matrix gives 4800 m. However, I calculated the distance in QGIS along the river center line. When I include the distance from the station to the river center line, I get 4500 m. So could it be that the calculation includes the station-distance to the river center line? Obviously, this is mostly negligible in rivers because they are not that wide and the distance of a station to the river center line is very small. However, in estuaries and fjords like with the Gudena, this can be a few hundred meters or even kilometers.
Also, the distances between the stations in the EMMN
project are about 1000 m off. These distances are calculated within a polygon shapefile. Not sure why they are 1000 m off, but could be related to how the script handels polygons?
Yes, indeed. The function measures exactly the distance between receivers. For this reason, the function adapt.binarymask()
is needed: to be sure we have them included in the binary raster (1s against 0s background) to the river while calculating the shortest distance. Again, you have to see the binary mask saved as tiff to understand this. Here below a screenshot related to Gudena river and receiver KATNORD
(upper right). Do you see the white squares? This white squares are added to connect the stations to the nearest point belonging to the waterbody.
So, the question is: do you want the possibility to calculate the distance among the receiver projections on the line/polygon of the waterbody instead of the real receiver distance as done now? In other words, based on drawing below: do you want to have the possibility to measure the distance between the white points instead of the red ones?
So in brief: I don't need the distance between the receiver and the blue line (= the projection). The real distance as calculated now is what I want π . However, I still don't get why there is a difference of 1000 m between the calculations in R and those in QGIS? Why is the square of KATNORD
bigger than the others in the binary.tiff?
So in brief: I don't need the distance between the receiver and the blue line (= the projection).
To be 100% sure: should I "remove the white line" while measuring the distances? Or saying it in another way, should I thinki he receivers as they would be ON the riverbody line?
However, I still don't get why there is a difference of 1000 m between the calculations in R and those in QGIS?
Based on your https://github.com/inbo/fish-tracking/pull/69#issuecomment-824772882 the difference as I calculate the stuff now is not 1000m but "just" 300m.
Why is the square of KATNORD bigger than the others in the binary.tiff?
The square size depends on the distance of the receiver from the blue line. KATNORD is further than UDBY2, just to speak about the screenshots I used. Further the receiver is, bigger the square has to be.
Aha, that makes more sense π . I got confused with what you mean with the real receiver distances. I would need the distance between the white dots in your print screen (= distances between the receiver projections). And btw, the resolution based on pixels is much better π !
Ok, I need then some more time to implement it. Unfortunately next weeks will be very busy. I hope to find a little time first half of May. I will still allow users to choose between the two kind of distances.
No problem and thanks π !
@PieterjanVerhelst : I think I succeeded to use the projections of receivers on the line/polygons of the river body in the distance calculation. The projections are called projections.locations.receivers
. The projection is almost in all cases very precise. This means that extending the binary mask is often not needed anymore as the receiver projections are really ON the river body. The projection points are found using function dist2Line
from package geosphere
.
I checked with some rivers, a.o. the Gudena river and I am happy to state that the distance among UDBY2 and KATNORD receiver stations is now 3784.5 m using 50m pixel resolution.
Could you please check this again?
Below a zoomed screenshot from the new mapView with both receivers (red) and their projections (white) on the river body shapefile
Almost there :smile:. The functionality works perfect for line-shapefiles and runs also much faster than previous version π . However, with polygons the receiver locations are projected somewhere on the edge of the polygon. One example is the EMMN
project in Norway. Apart from solving this discrepancy between lines and polygons, in some cases there is also a combination between both, such as for the Belgian and Dutch waterways (see #70 ).
Well, this is a "discrepancy" we can live with, don't we? :smile: Actually I don't see it as a problem to be solved by the functionality, but it's a uniformity problem among the river body (input) data. Solving #70 is something I will check after this PR is merged and closed.
I'm confused :sweat_smile:. So you will make a discrepancy in input data (lines vs polygons) in this PR?
ahah :rofl: No, I just mean that there is difference (discrepancy) between the set of inputs (some of them are lines shpes, some are polygons) you use. The function I use for finding the white points (projections) calculates the nearest point of the river body geometry to the red points (receivers). So, it's normal that in case of lines it's on the line itself, while in polygons it's on the boundary of the polygon. I would not pay too much attention to this "discrepancy", as it doesn't make really a difference.
The only thing I think could improve my code a little more is adding the following condition:
Aah of course :grin:! That makes perfect sense! Just to be sure I understand your suggestion: when using line shapefiles and the receiver is not on the line, the projection will be used (which is now the case). When using polygon shapefiles and the receiver is in the polygon, the actual receiver position will be used.
Here just a print screen from the EMMN project in a fjord in Norway, where you see that the receiver positions of the arrays are projected on the edge of the polygon:
Indeed, this is the only thing I would like to improve still in this PR. Use the projections ONLY if the receiver is out of the river body shapefile :+1:
Ok, I think the projection is now working fine. the points projections.locations.receivers
will be equal to locations.receivers
if the latter ones are IN the water body. See screenshot below from emmn
situation. the points are pink as the projections (white) are superimposed on the receivers (red) resulting in pink.
The function find.projections.receivers()
will also return for each receiver whether projection is needed or not. Example from emmn
:
> projections.locations.receivers <- find.projections.receivers(
shape.study.area = study.area,
receivers = locations.receivers,
projection = coordinate.string
)
Receiver station 59 is in the water body. No projection needed
Receiver station 45 is in the water body. No projection needed
Receiver station 44 is in the water body. No projection needed
Receiver station 43 is in the water body. No projection needed
Receiver station 28 is in the water body. No projection needed
Receiver station 29 is in the water body. No projection needed
Receiver station 26 is in the water body. No projection needed
Receiver station 30 is in the water body. No projection needed
Receiver station 21 is in the water body. No projection needed
Receiver station 42 is in the water body. No projection needed
Receiver station 31 is in the water body. No projection needed
Receiver station 9 is in the water body. No projection needed
Receiver station 8 is in the water body. No projection needed
Receiver station 10 is in the water body. No projection needed
Receiver station 38 is in the water body. No projection needed
Receiver station 36 is in the water body. No projection needed
Receiver station 11 is in the water body. No projection needed
Receiver station 12 is in the water body. No projection needed
Receiver station 40 is in the water body. No projection needed
Receiver station 7 is in the water body. No projection needed
Receiver station 41 is in the water body. No projection needed
Receiver station 13 is in the water body. No projection needed
Receiver station 39 is in the water body. No projection needed
Receiver station 5 is in the water body. No projection needed
Receiver station 27 is in the water body. No projection needed
Receiver station 25 is in the water body. No projection needed
Receiver station 24 is in the water body. No projection needed
Receiver station 3 is in the water body. No projection needed
Receiver station 14 is in the water body. No projection needed
Receiver station 15 is in the water body. No projection needed
Receiver station 35 is in the water body. No projection needed
Receiver station 23 is in the water body. No projection needed
Receiver station 17 is in the water body. No projection needed
Receiver station 18 is in the water body. No projection needed
Receiver station 20 is in the water body. No projection needed
Receiver station rel_emmn1 is in the water body. No projection needed
Receiver station rel_emmn2 is in the water body. No projection needed
From mondego
river, the situation will be the opposite: all receivers are out of the river body which is a line.
Receiver station R1 is not in the water body and will be projected on it.
Receiver station R2 is not in the water body and will be projected on it.
Receiver station R3 is not in the water body and will be projected on it.
Receiver station R7 is not in the water body and will be projected on it.
Receiver station R8 is not in the water body and will be projected on it.
Receiver station R4 is not in the water body and will be projected on it.
Receiver station R5 is not in the water body and will be projected on it.
Receiver station R6 is not in the water body and will be projected on it.
Receiver station rel_mondego is not in the water body and will be projected on it.
If there is no station_name
, the message on screen will visualize the row number. Some checks have been also introduced to make the function more resilient.
Please have a look again and I really hope everything works as we wish.
Very elegant! That works perfectly!
I just checked the code on line shapefiles and one projection in the River Stour is a bit off (stour_19
). I don't understand why the receiver is not projected on the line.
Yes, I don't know why even. I found it some days ago as well, but it seems happening to only one (maybe two?) points. Check more cases and record such anomalies. If they remain very limited as I think, we can merge this PR and move further. Fingers crossed.
I found a few more in the River Frome:
However, the detection range of the receivers (= measurement error) is larger than this calculation bias. The wrong projections are on a very small scale (tens of meters).
Just checked some more: I get the impression this mainly happens when there are multiple receivers close to each other. As already mentioned, the distance 'error' due to the projection is much smaller than the detection range of the stations. Therefore, imo we can merge the branch. Consider it a job very well done π π !
Thanks @PieterjanVerhelst for your comments and reviews. I will merge this PR and close it. We made a sensible improvement since we started this months ago.
This PR solves several bugs found in draft code of Stijn when trying to calculate distance among receivers in other rivers than the Schelde. The code now works both for rivers as polygons (e.g. Schelde) and rivers as lines. Tested on the next study areas:
I document here below the bugs and how I solved them:
Bugs
Boundary conditions
If some receivers are out of the river, i.e. multiple clumps (or patches) are detected, the binary mask (0, 1 values) has to be extended. This is done in help function
extend_patches()
by extending the patch with the receiver until the patch is connected to the main patch. While doing this we have to take care of the boundary condition: we cannot change pixel values with row/col index < 1 or row/col index > #rows/#cols.Bug reported in #62 and solved in 2d4dc1ed128219a615ab1f46da45fdbcceb60a85
Max() calculation with NAs
When making a raster out of the river spatial object, the NAs are not converted to 0 although this is done correctly while making the raster with receivers. This can be risky as we need to "merge" the binary mask raster by using
max()
function. max with NAs returns NA. An other alternative could be to use argumentna.rm = TRUE
inmax()
, but this is less elegant and is not symmetric.Solved in dbba687a6d63b6559b464b2a8e0037167c7ebe72
Use receivers to assess raster extent
If the raster is only based on the river spatial object, it can happen that some receivers fall out of it. When receivers are added to raster in function
adapt.binarymask()
by:no information is returned about receivers falling out of the raster extent. This happens for example with river gudena using standard resolution nrow = 2000 and ncol = 4000. See map screenshot here below.
Solved in f3e7f5e521bfbe665213baa02416ce720637fc90
Avoid transformation of one-row matrix to vector
This behavior of R is quite tricky. When a matrix has one row and you try to do some operation on it, the output is transformed silently to a vector. The use of
nrow()
would generate an error asnrow()
cannot be applied to vectors.This can be solved by forcing R to not dropping to basic class, by using
drop = FALSE
. ExampleSolved in de2ddde0db0184835cfa20a51b6856d825470278
Adapt condition for stopping iteration
The extension of the binary mask to include all receivers should stop when all pixels with value 1 are conencted, i.e. there is only one clump, also called patch.
Solved in afb87d5010d81791ee97f1328a3692f6d437c92a
Improvements
These are not bugs, but improvements of the code.
Improved notifications
While extending the binary mask to include receivers, a message is returned while iterating the code to let the user know how many patches are still present. The extension stops when one patch only is detected. Example here below:
Solved in 3ae43b40aa7b2e1ef26a70ab7fa99b712ed63c3a
Use message instead of print
In general use of
message()
is way better thanprint()
to communicate information to users.Solved in same commit as before
Returns receiver distances as tibble data.frame
I changed the class of the output receiver distances: instead of a matrix, now a (tibble) data.frame is returned. In this way it is much more readable when printed on screen.
Solved in bf1c74dee00367f14b8eb76d4afc00b2bb4a01be
Show station names as receivers labels in maps
It's handy to have station names as labels of the receivers in the leaflet map instead of progressive numbers as the station names are used as row and column names of the distance data.frame.
Solved in 07ee9c0bf5b6e8308b320c8c57ddccab44901989