inbo / fish-tracking

🐟 Collection of scripts for processing and analysing fish tracking data
3 stars 0 forks source link

Subscript out of bounds #62

Closed PieterjanVerhelst closed 3 years ago

PieterjanVerhelst commented 3 years ago

When trying to calculate the distances between detection stations in the River Frome (branch distance-frome), I get an error at line 107 in the script prepare_dataset:

> study.area.binary.extended <- adapt.binarymask(study.area.binary, locations.receivers)
[1] 2
Error in `[<-`(`*tmp*`, (crdnts[1, i] - 1):(crdnts[1, i] + 1), (crdnts[2,  : 
  subscript out of bounds

I played around with the number of columns and rows to fit the file in the grid, but it didn't work. Any thoughts on this one?

damianooldoni commented 3 years ago

This error raises when running this commando at line 95:

study.area.binary.extended <- adapt.binarymask(study.area.binary, locations.receivers)

Debugging this, I found that the problem is in the for loop in help function extend_patches() (receiver_distance_fun.R#L146-L148):

for (i in 1:ncol(crdnts) ) {
    inputmat[(crdnts[1, i] - 1):(crdnts[1, i] + 1), (crdnts[2, i] - 1):(crdnts[2, i] + 1)] <- 1
}

Within this function, I find:

Browse[2]> nrow(inputmat)
[1] 2000
Browse[2]> ncol(inputmat)
[1] 4000
Browse[2]> max(crdnts)
[1] 4000
Browse[2]> crdnts[2,7184]
[1] 4000

This means that in the for loop, crdnts[2, i] + 1 becomes 40001 and this is indeed out of bound as number of columns of inputmat is 4000.

I didn't sovle the bug, but I found the problem, it's 50% of the work! :factory_worker: I will investigate further. @PieterjanVerhelst : the links are to master, but I am working on branch distance-frome of course and anyway, the help functions are the same as in master.

PieterjanVerhelst commented 3 years ago

Great, thanks!

PieterjanVerhelst commented 3 years ago

@damianooldoni I have been trying some stuff out lately, but couldn't make much improvement. However, I noticed that the River Frome shapefiles are SpatialLinesDataFrame while the files on which the code was developed were SpatialPolygonsDataFrame. Could that be part of the problem as well, specifically that SpatialLinesDataFrame standard have a larger resolution? And if so, can the code be made flexible to allow and even combine both dataframe types? I assume this may take some time, let me know how you want to proceed with this. The calculation of the station distances is the first step in a European-wide meta-analysis I am conducting on eel migration. Hence, as soon as this issue is solved, the sooner I can start digging into the data.

damianooldoni commented 3 years ago

Thanks @PieterjanVerhelst for the hint. I didn't write this code and I probably have to spend some more time to understand better the details of it. Stijn documented the functions but not every single step of course. It can be I contact you by email to get some more background if needed. I hope I can find a easy but complete solutions for this.

damianooldoni commented 3 years ago

I think I solved the bug. The problem was due to the fact that one of the receivers was at one the edges of the raster. Can you check whether the distances between receivers makes sense? See commit mentioned here above for more details. I have also cleaned a little bit the code (typos, some variable names, ...) and used sf objects where possible and converting them on-the-flow to sp only for functions where sf objects are not supported. This actually happens only once in costDistance(), a function of gdistance package.

If using sf objects will arise errors in other parts of the workflow, no worries. Just let me know via a separate issue and in any case, we can always convert them to sp points/polyongs by using the as() function from sf package:

sp_obejct <- as(sf_object, "Spatial")

But, first I would like to see this issue closed of course :smile:

PieterjanVerhelst commented 3 years ago

Hmm, I still get the error. Did you work in the 'distance-frome' branch? Can you tell which station was on the edge of the raster? I checked in GIS and they seem fairly close to the river or in the river. And I indeed get an error for other shapefiles that sf objects are invalid:

> study.area <- gUnion(rivers, nete)
Error in RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid") : 
  rgeos_convert_R2geos: invalid R class sf, unable to convert

But indeed, let's first solve the issue with the subscript out of bounds :satisfied:.

damianooldoni commented 3 years ago

Yes, I forgot while cleaning everything to remove the line code with the bug. Pull and try again. Yes, I am working in distance-frome branch. By the way, I didn't commit the outputs of prepare_datasets.R, i.e. the tiff file and the csv with the distance among receivers. I leave it to you.

PieterjanVerhelst commented 3 years ago

Alright, the code calculates the matrix for the River Frome 👍 ! I will try to run it for networks in other rivers the coming week and if there are no more errors apart from object classes, I will close the issue.

Just one small question: in the output, the distance between stations F2 and F3 is 23 m, while a rough measurement in QGIS gives 15 m (see figure below). Note that this is the distance along the straight line, which is the shapefile. The background itself is an online map. image

When I increase the resolution from nrows <- 2000 and ncols <- 4000 to nrows <- 4000 and ncols <- 8000, the distance even increases to 25 m.

Any idea why there is quite a difference between the QGIS calculation and the R calculation? And why increasing the resolution leads to a larger distance? This difference is not problematic, since the detection error of the receivers is much larger (ca. 300 m). It's just to be sure there is no bug or something in the code/calculation.

PieterjanVerhelst commented 3 years ago

I tried to run the code for other receiver networks in other rivers, but encountered a problem. However, this problem may have nothing to do with this issue, so when the problem is more cleared out, we may close this one and open a new issue. I tried 2 networks (code has been added to the prepare_dataset.R script; I will sent shapefiles via mail because they are too large to push): the network in the River Gudena (Denmark) and River Mondego (Portugal). For the River Gudena the code only calculates distances between some stations, while for the River Mondego I get an error:

> cst.dst.frame <- get.distance.matrix(study.area.binary.extended,
                                      locations.receivers)
Error in costDist[] <- shortestPaths[index, index] : 
  replacement has length zero
In addition: Warning message:
In .cd2(x, fromCoords) :
 Show Traceback

 Rerun with Debug
 Error in costDist[] <- shortestPaths[index, index] : 
  replacement has length zero 

However, for both rivers, I had to create the line-shapefile myself. For the River Gudena, I added the last part of the river (the original shapefile did not run to the sea) and for the River Mondego I didn't have a proper shapefile, so created it myself. These generated shapefiles only have an 'ID' number. @damianooldoni any thoughts on how to proceed with this? The line-shapefiles from other rivers also need manual additions in GIS.

damianooldoni commented 3 years ago

For better traceability I noticed these two anomalies in two different issues, #64 and #65. Getting back to the isue about distance for From river, indeed, distance should not diverge while increasing the resolution. I have an idea about this bug. Could you please check whether the distance diverge also for receivers of networks you already used in the past? Thanks. Meanwhile I try my solution on river Frome.

PieterjanVerhelst commented 3 years ago

@damianooldoni currently I can only calculate distances between stations from the River Frome because of the above mentioned issues in the River Gudena and Mondego. When trying to calculate the distance between stations for the Belgian network, I also get an error, but will put that in another issue.

damianooldoni commented 3 years ago

I have just now watched carefully your screenshot. You measured the distance along the shapefile, BUT as the shapefile is a line, in order to contain the receivers the raster is extended (study.area.binary -> sutdy.area.binary.extended) making the river "broader". So, in this case the calculated distance is more similar to the length of the hypotenuse than the length of the line you measured.

See screenshot before extension: image

And after extension; image

As I said some days ago by phone, I was considering to change strategy when rivers are lines instead of polygons, but I thought it was better as first step to solve this issue which was not depending on the type of spatial objects the river is made of (polygons or lines). Moreover, as we will combine polygons and lines at a certain point, it makes the "new" method prone to bugs. So to play safe, I will focus on existent code although it is less accurate with rivers as lines.

PieterjanVerhelst commented 3 years ago

aha, this makes sense! So this issue can be closed, correct?

damianooldoni commented 3 years ago

I would wait still a little. I find still worth a check that the distance grows so much while increasing the resolution. First i would like to close the new one, #66 so that you can check what I asked for here. If should be solved easily. I am working on it just now.

PieterjanVerhelst commented 3 years ago

I ran some checks on distance calculations for the Belgian and Frome network. The variations are quite okay, especially for Belgium. Example of distances between stations in Belgian network:

@damianooldoni upon your approval, I will merge the branch and close this issue. Note that I will add documentation about the persisting error that can be ignored (https://stackoverflow.com/questions/61598340/why-does-rastertopoints-generate-an-error-on-first-call-but-not-second ). I will also put the receiver network files in a separate folder since I will start uploading various network files.

damianooldoni commented 3 years ago

Nice! Yes, do it. It seems also that calculated distance converges to GIS manual measurement by increasing the resolution. Yes, makes separate folders, file management is important. And adding documentation about that error is a good reminder for us and anybody else running the code.