inbo / fish-tracking

🐟 Collection of scripts for processing and analysing fish tracking data
3 stars 0 forks source link

Refactor to sf and make study area with lines and polygons possible #73

Closed damianooldoni closed 2 years ago

damianooldoni commented 3 years ago

This PR solves #72 by using rbind instead of gUnion() for combining polygons and lines in a study area. I also converted all code using sf package wherever possible. Some raster related functions want SpatialLines or SpatialPolygons as input, so conversion to sp objects is done only where needed and it is internal to some functions, so hidden to most of the users.

Combining polygons and lines is still not really recommendable. Even if rbind helps a lot, still, the resulting geometry is (sfc_GEOMETRY) is not supported by mapview for example.

I also generalized the way function load.shapefile() select a subset of shapefiles. This function selected the shapfiles based on column NAME which was hardcoded in the function. In this PR I made it flexible:

crs_system <- 32631
river.names <- c("Schelde", "Durme", "Rupel", "Netekanaal")
rivers <- load.shapefile(
  file = "./data/lowcountries_water/LowCountries_Water_2004.shp",
  layer = "LowCountries_Water_2004",
  crs_cystem,
  subset.names = river.names,
  name_col = "NAME")

I had still no time to run all up to the end of the workflow, but I think it's getting quite far now.

PieterjanVerhelst commented 3 years ago

@damianooldoni thanks! Indeed, almost there! Just to be sure, to plot a network in either lines or polygons, the code after mapView() is used and when using a mix, the leaflet() chunk should be applied? I just ran the code on the pbarn_freshwater and ws_bpns mix. I now get an error at this code-chunk:

study.area.binary <- shape.to.binarymask(
  shape.study.area = study.area,
  receivers = projections.locations.receivers,
  nrows = nrows,
  ncols = ncols
)

Note that this code generated a code anyway (bug in R), but this is a different error and the code stops running after the error:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'rasterize': error in evaluating the argument 'x' in selecting a method for function 'addAttrToGeom': conversion from feature type sfc_GEOMETRY to sp is not supported
damianooldoni commented 3 years ago

I think your problem mentioned in #72 has been solved by the very last commit? I don't know, I got no errors from shape.to.binarymask()

PieterjanVerhelst commented 3 years ago

Strange, I still get the error However, no last commit came through.

damianooldoni commented 3 years ago

Well, in your error I can read this:

[...] function 'rasterize': error in evaluating the argument 'x' in selecting a method for function 'addAttrToGeom': conversion from feature type sfc_GEOMETRY to sp is not supported

the error seems to happen in receiver_distance_fun.R#L291.

Indeed, sfc_GEOMETRY is the class of the column geometry in the study area, which is a sf data.frame. As it cannot be set to sfc_MULTIPOLYGON/ stc_POLYGON or sfc_MULTILINE/sfc_LINE, R set the combined study area geometry to something generic called sfc_GEOMETRY.

But it seems strange because in that line I pass as_Spatial(shape.study.area) to rasterize so the conversion to a Spatial object happens using the standard as_Spatial() function from sf package.

I think you pulled my commit(s) but you forgot to source receiver_distance_fun.R again before running it. Can you try it again, please? If you still get the error, try this in your console:

test <- as_Spatial(study.area)

and let me know if it works flawlessly as on my laptop.

PieterjanVerhelst commented 3 years ago

Strange... I restarted R and sourced the function again. Since that didn't work, I updated R to the latest version (4.1.0) but the error still persists. Could you sync with Github to be sure I am not missing any changes? Also, when running your suggested test, I get the same error:

> test <- as_Spatial(study.area)
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'addAttrToGeom': conversion from feature type sfc_GEOMETRY to sp is not supported
damianooldoni commented 3 years ago

Thanks @PieterjanVerhelst. I checked my code better, and indeed, I found a bug. Now I get the same error as you. Please pull again. Indeed, the error is that we cannot make a Spatial Polygon from a simple feature data.frame where the geometry is of type sfc_GEOMETRY :disappointed:

PieterjanVerhelst commented 3 years ago

Hmm, still get the error... So on your pc the code runs perfectly?

damianooldoni commented 3 years ago

No, as I said, I get the same error as you now.

damianooldoni commented 3 years ago

My idea is to proceed by rasterizing the two sf data.frames apart and then merging them in the function, something similar I did in other functions today.

PieterjanVerhelst commented 3 years ago

ow wait, something is running!

PieterjanVerhelst commented 3 years ago

Aah no, false alarm. It just took a bit more time to return the error 😞 .

damianooldoni commented 3 years ago

@PieterjanVerhelst: it's impossible it works. As I wrote above I understood the error, it's something again related to the fact that merging polygons and lines is something spatial packages in R don't like it very much. I will work on this, I hope to get it solved soon.

PieterjanVerhelst commented 3 years ago

Okay, thanks πŸ‘ !

damianooldoni commented 3 years ago

Not so much time today to work further on this. Still, tried merge() but didn't work as this is not what we need! Now I found actually need a mosaic function. I will then follow this tutorial about making a mosaic of rasters. This has to be implemented in function shape.to.binarymask(). Stay tuned. :+1:

damianooldoni commented 3 years ago

I think I solved the problem. In prepare_dataset.R, when you have to transform the shapefile to a raster via shape.to.binarymask(), you have to choose:

I also removed two transformations to sp objects via sp_Spatial() where not needed in control.mask() and adapt.binarymask().

@PieterjanVerhelst: give a try, thanks.

PieterjanVerhelst commented 3 years ago

I checked the new code and it seems to provide a realistic output πŸ‘ ! I will test it more thoroughly the coming days on the various datasets/projects to be sure. Just a brief question, but as you state the value for res should be interpreted as a square, that means res = 10000 is 100 by 100 metres?

damianooldoni commented 3 years ago

Thanks @PieterjanVerhelst: leave this PR open then until you performed all checks. I tested with a classic uniform study area (e.g. frome and stour) and it returned no errors and I think the same distances as before.

About resolution: NO, the resolution of a raster is the side of the cell. So, if you give res = 10 you will have a square which is more or less 10x 10. I say more or less, as it depends on the extent of the raster and rounding effects. So, in case of study.area <- stour you will get:

> study.area.binary.extended
class      : RasterLayer 
dimensions : 513, 2333, 1196829  (nrow, ncol, ncell)
resolution : 10.00025, 9.990898  (x, y)
extent     : 360408.7, 383739.3, 5754536, 5759662  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=31 +datum=WGS84 +units=m +no_defs 
source     : memory
names      : layer 
values     : 0, 1  (min, max)

As you can read above, the resolution is very near to 100 x 100. Moreover, the real resolution is returned on console while running shape.to.binarymask(). I want always to inform the user about these things :+1:

I hope everything is clear now. In case it is not, just keep posting! :postbox:

PieterjanVerhelst commented 3 years ago

Thanks for the clarification! BTW, I tested the code on a mix of lines and polygons (belgian network: freswhater + ws_bpns) and it works. I will continue testing on the other study areas in Europe.

PieterjanVerhelst commented 3 years ago

A brief question: is it a problem when polygones are not connected? Or does the resolution of the rasterization process bridge these 'gaps'? Here an example of a reservoir that should be one big polygon, but due to bridges, the reservoir is separated by different polygons:

image

damianooldoni commented 3 years ago

The rasterization process removes these gaps automatically as the resolution of the raster is not so high to make the bridges appear. But if you increase the resolution really a lot you will get problems I think.

At the moment I don't connect patches automatically because sometimes I had river bodies made of two different entities. So, the number of patches was 2 and after adding receiver points you should expand the raster to connect all receivers and get the number of patches back to 2. That's the idea behind. Now, with the projected receivers this step is way less important as the majority of the projections is working perfectly and so there is no need to "extend" the river body actually.

PieterjanVerhelst commented 3 years ago

Okay, that is what I thought, so perfect πŸ‘ .

PieterjanVerhelst commented 2 years ago

I found one small bug in calculating the distances for the Belgian networks, more specifically at station s-11. For some reason, the position of that station is not projected on the river line-shapefile, but on the river bank, causing no calculated distances for that station (value = Inf). Here is a print screen of the map. The projected position on the bank is made more clear by putting my cursor on it, showing the s-11 field:

image

UPDATE: the same holds true for station bh-34 in the project 2012_leopoldkanaal. Also station stour-19 (from project stour) is not projected on the river mid-line, but on the bank. However, it does generate a distance output for that station which may not be that far off the actual distance.

PieterjanVerhelst commented 2 years ago

When calculating the distances for the project semp in Lithuania, I get a memory error after running the following code:

cst.dst.frame_corrected <- get.distance.matrix(
  binary.mask = study.area.binary.extended,
  receivers = projections.locations.receivers
)

Error: cannot allocate vector of size 3.8 Gb

This is rather strange, because the semp project is much smaller than for instance phd verhelst. Note that I restarted R a few times to check if that would clear calculating memory, but it did not solve the problem.

damianooldoni commented 2 years ago

@PieterjanVerhelst: the memory errors pop out because you have still a lot of other water bodies loaded in R, I think. I suggest you to clean your workspace after running the workflow for few river bodies/receivers and restart R (within RStudio) after cleaning. In WIndows Taskmanager (Windows Taskbeher, see screenshot) you will see a sensible drop off of the used memory :+1:

image

damianooldoni commented 2 years ago

To restart R within RStudio: Session -> Restart R (or just Ctrl+Shift+F10)

PieterjanVerhelst commented 2 years ago

Indeed, that is what I tried a few times. Now I also ran the code on the VLIZ RStudio server which has more RAM power, but the error persists. Do you get an output on your pc?

damianooldoni commented 2 years ago

Ok? I will make a fast test for semp. At which resolution are you working?

PieterjanVerhelst commented 2 years ago

okay! I work at res <- 50

damianooldoni commented 2 years ago

Whiel running Γ dapt.binarymask()` step, I read:

Number of patches of binary.mask (river body): 3

This means that the river body is made of three different parts. Is this the case? As I wrote yesterday I do not connect river bodies if disconnected. So, if we start from 3 patches, after adding the receivers to the raster, we will end up to 3 patches.

However, the message printed right after:

The binary.mask (river body) is not connected. Extension needed

is confusing, I have to admit. As this is something I didn't do it in the function. This means that the last function, will never end as it tries to calculate distance along the rivers among receivers which are not connected, so generating memeory size error at a certain point.

So, how to proceed? My idea is the following:

  1. Replace the confusing message, with ""River body is disconnected. This could lead to crash while calculating the distance among all receiver pairs placed in different patches.""
  2. Ask Do you want connect the river bodies?(Y/n)

What do you think? Connecting the river bodies is not difficult as I do something very similar after adding receivers.

PieterjanVerhelst commented 2 years ago

Indeed, three different river bodies are connected. Mostly, I do this in a GIS program to extract a single shapefile with my study area. In case of the semp project, I tried it via R and did not thought the error was due to that. Your suggestion to proceed is indeed a very good one. A side note: I have a few more study areas of which the polygons need to be connected. For example, the polygons meuse and coastal_meuse in the folder \data\Belgium_Netherlands need to be merged first, before they are merged to the line shapefile albertkanaal_zeeschelde. I just let you know this if this would impact your coding for the semp-issue.

PieterjanVerhelst commented 2 years ago

Than the last issue I bumped on, is when trying to calculate the distances for the project noordzeekanaal at res = 50. When running the code to create the extended binary mask, I get an error that the data was not a vector type, but NULL:

study.area.binary.extended <- adapt.binarymask(binary.mask = study.area.binary,
                                               receivers = projections.locations.receivers)

Error in writeRaster(study.area.binary.extended, "./results/study_area_binary", : object 'study.area.binary.extended' not found

Note that the project noordzeekanaal takes an incredible time to run (ca. 6 hours). So perhaps better to test with a subset?

PieterjanVerhelst commented 2 years ago

BTW, for the other projects, the code runs perfectly πŸ‘

And when the issue is solved with the stations s-11 and bh-34 currently not being projected on the river-line, another 3 projects will run perfectly as well.

So almost there πŸ˜„ !

damianooldoni commented 2 years ago

Thanks @PieterjanVerhelst for checking all and reporting all the errors.

I will take a look. First I will tackle the error about the disconnected river bodies. About the wrong projection of the receivers s-11 and bh-34, I cannot do a lot. I mean, I can check a little but I have no time now to go within the function dist2Line of package geoshpere which does the work.

PieterjanVerhelst commented 2 years ago

@damianooldoni thanks πŸ‘ ! Regarding the stations s-11 and bh-34, I suggest to change the coordinates a bit in the receivernetwork file so the stations are a bit closer to the river-line. Now they are too far off, leading to Inf values. What do you think on this temporary fix?

PieterjanVerhelst commented 2 years ago

Update: I will add another river-line in the shapefiles towards the stations bh-34 and s-11 to try to solve the problem instead of changing coordinates.

image

PieterjanVerhelst commented 2 years ago

Update: issues on stations not being projected on the river midline of the shapefile have been solved by adding river segments towards those stations, so they are projected accordingly.

PieterjanVerhelst commented 2 years ago

@damianooldoni I can merge the various rivers in GIS so you don't have to dig into this issue. I actually don't consider this 'merging' step to be part of the distance-calculation function. That leaves 1 rather big issue open: the one on the project noordzeekanaal for which I can't find a solution.

PieterjanVerhelst commented 2 years ago

Review summary: from the 18 projects for which I tried to calculate the distances (both lines, polygons and mix), an error occurred for only 2 πŸ‘ :

damianooldoni commented 2 years ago

Thanks @PieterjanVerhelst for summarizing all bugs in a comment, it will ease my work! I will take a look but it will be next week.

PieterjanVerhelst commented 2 years ago

No problem! I am still digging into the issue with the noordzeekanaal, since an output is generated when I take a higher res. So if I can find a sollution, that issue may drop out by next week ;-).

damianooldoni commented 2 years ago

About noordzeekanaal I think I found the error. You validate only shapefiles before merging them together, but validation should always occur. For this reason I added a validation of study.area after defining it (part of commit cf79779), in this way it's validated, no matter which shapefile is used. I don't know which receivers you are using, but I am quite sure this is the bug.
give a try, please. The other one, albertkanaal, another day.... :smile:

PieterjanVerhelst commented 2 years ago

I still get an error for the project noordzeekanaal. Note that I use the receivernetwork_Noordzeekanaal.csv. When running the code study.area.binary.extended <- adapt.binarymask(binary.mask = study.area.binary, receivers = projections.locations.receivers) it returns the error Error in matrix(unlist(ini), ncol = 2, byrow = TRUE) : 'data' must be of a vector type, was 'NULL' I assume a specific column is not filled in or present in this spatial dataset?

PieterjanVerhelst commented 2 years ago

@damianooldoni I just got back from holidays so can continue testing again ;-). As mentioned above (comment of 8th July), I only get errors for the projects albertkanaal and noordzeekanaal. BTW, I come and join the (EV)INBO forces from the 1st of September πŸ˜‰ .

PieterjanVerhelst commented 2 years ago

Update: The issue with the project Noordzeekanaal is solved and probably had to do with a few shapefile-sections that were not connected. However, the issue for the project 2013_albertkanaal persists. This project usess a combination of a line shapefile (albertkanaal_zeeschelde.shp) and a polygon shapefile (meuse_total.shp). Here I get the error Error: Number of clumps (patches) not equal to initial number of patches after running the following code:

study.area.binary.extended <- adapt.binarymask(binary.mask = study.area.binary,
                                               receivers = projections.locations.receivers)

@damianooldoni this is the last issue, so when this is solved, we can merge the pull request and I will contact Hugo Flavio if he has interest to implement this functionality in his actel package.

PieterjanVerhelst commented 2 years ago

In the shapefiles for the project 2013_albertkanaal there were also some connectivity issues, which I solved in GIS. So now the function runs for that project as well, so there are no more issues. Two things to take in mind: 1) Make sure the shapefiles are connected (with waterways it occurs that road bridges lead to intersections) 2) For large shapefiles and networks, a memory issue can pop up. This can be solved by increasing the memory limit via memory.limit(size = )

damianooldoni commented 2 years ago

Many thanks to find the solution for this! πŸ‘ Two things:

  1. As you are the reviewer, can you officially approve this PR?
  2. I see that there is a conflict. Very likely something very small. Tomorrow I will find the time to solve it and then I merge πŸŽ‰ πŸ•Ί
damianooldoni commented 2 years ago

Conflicts solved. As expected, not very significant and easy to solve conflicts. @PieterjanVerhelst: please, check once again and approve the PR. Then you can merge and dance πŸ•Ί

PieterjanVerhelst commented 2 years ago

Just ran a test on the albertkanaal-project and code runs perfectly. So here comes the merge. Next thing to do is arrange a meeting to have a drink πŸ˜„ !