prioritizr / prioritizrdata

Conservation planning data sets
https://prioritizr.github.io/prioritizrdata/
2 stars 0 forks source link

Good example data sets #1

Closed jeffreyhanson closed 7 years ago

jeffreyhanson commented 7 years ago

Hi all,

I think it would be really good to include some "real world" (-ish) example data sets. Does anyone have any awesome open conservation planning data sets they can share? Note that the data can't be too big though. Files on GitHub must be < 100MB each.

Also, I expect CRAN won't accept anything too big. So, this package might have to live on GitHub if example data sets we use are too big.

Cheers,

Jeff

ricschuster commented 7 years ago

I will put something together from our previous work. Just need to make sure I find something useful that's small enough.

Also need to check with Peter Arcese to make sure he is okay with us use these data.

ricschuster commented 7 years ago

I'm still waiting on conformation to use one part of our dataset, but overall I have permission to use our West Coast of Canada case study now.

We are working with irregularly shaped planning units, so a shapefile would make the most sense I think. Do you think I should also convert the data to raster to use as example dataset for that approach as well?

jeffreyhanson commented 7 years ago

Awesome!

I'm not sure. I guess it depends on what we want the vignettes to do. I was imagining that the case-study vignettes would walk the user through an example conservation planning scenario using realistic-ish data. So if it doesn't make sense to have the planning units for this data set in raster format then I don't think it would be helpful to the user?

Ideally, I think it would be great to have at least one vignette per planning unit input type (ie. point, line, polygon, raster)?

ricschuster commented 7 years ago

I agree, once case-study vignette per planning unit input type would be ideal. This dataset would work for polygon and raster approaches (in our work we use both). I don't have any line input type example.

Do you think point data would be useful? Did you imagine this being centroids that would be converted into raster or really straight up point locations? If centriods, I could put that together as well from what we already have.

On 2017-02-18 15:22, Jeff Hanson wrote:

Awesome!

I'm not sure. I guess it depends on what we want the vignettes to do. I was imagining that the case-study vignettes would walk the user through an example conservation planning scenario using realistic-ish data. So if it doesn't make sense to have the planning units for this data set in raster format then I don't think it would be helpful to the user?

Ideally, I think it would be great to have at least one vignette per planning unit input type (ie. point, line, polygon, raster)?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prioritizr/prioritizrdata/issues/1#issuecomment-280882790, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSRZAApN9DIc15v2PXD14_qGR0y8nvQks5rd30ngaJpZM4L3Uh3.

-- Richard Schuster

Liber Ero Postdoctoral Fellow Geomatics and Landscape Ecology Laboratory Department of Biology Carleton University Email: mail@richard-schuster.com http://richard-schuster.com/ Twitter: @RicSchuster ph: 250-635-2321

jeffreyhanson commented 7 years ago

Ah ok. I worry that it might get confusing for the users if we have the same case-study data in multiple formats.

We have the Tasmania case-study data from the Marxan workshop materials. This data has hexagon polygon data for planning units. So we could either:

  1. Convert the Tasmania case-study planning units to raster data, and use that for the Tasmania case-study, and use polygon planning units for your data set?

  2. Keep the Tasmania case-study as is, and use raster planning units for your data set?

If it makes sense to use raster planning units for your data set, I think going with (2) might be better because it gives Marxan users a familiar data set to play around with, and it will probably reduce the size of the data set if we store your planning units in raster format. How does that sound?

As to the line data, I don't have any line data sets. But there's a few freshwater conservation biologists in the group here at UQ, so I'l ask them if they have any data sets they don't mind contributing?

Yeah I mean straight up point locations (eg. in SpatialPointsDataFrame format). Yeah, I haven't come across any specific instances of prioritizing point data - I just thought it might handy to have. I guess it might be useful when prioritizing different populations - maybe this is done when prioritizing actions for reefs? If its not really used, we can just include the functionality but not devote a vignette to it?

ricschuster commented 7 years ago

Good point re: confusion.

Let's keep Tasmania as is (i.e. polygons) and use our data for the raster case study.

re: line data. Great. What do you think about asking Simon Linke for both data and interest in joining in this? He and his group might have some good ideas re: connectivity metrics to include.

re: point data. I will search a bit to see if I can dig up examples. I don't recall seeing any before though, but that doesn't mean much.

re: additional vignette. I'm not sure how useful this would actually be, but for the package that I started I have also included the option for users to provide csv files (including lat/long). It's nothing I have used before, but the person that requested this is working this that kind of data, although their data come from rasters. What I did for that approach was to use the csv file as the cursor for raster files. It's maybe not worth a vignette at this point, but I think an 'ingest' function to make csv files usable for our package would be a good idea.

On 2017-02-19 00:13, Jeff Hanson wrote:

Ah ok. I worry that it might get confusing for the users if we have the same case-study data in multiple formats.

We have the Tasmania case-study data from the Marxan workshop materials. This data has hexagon polygon data for planning units. So we could either:

1.

Convert the Tasmania case-study planning units to raster data, and
use that for the Tasmania case-study, and use polygon planning
units for your data set?

2.

Keep the Tasmania case-study as is, and use raster planning units
for your data set?

If it makes sense to use raster planning units for your data set, I think going with (2) might be better because it gives Marxan users a familiar data set to play around with, and it will probably reduce the size of the data set if we store your planning units in raster format. How does that sound?

As to the line data, I don't have any line data sets. But there's a few freshwater conservation biologists in the group here at UQ, so I'l ask them if they have any data sets they don't mind contributing?

Yeah I mean straight up point locations (eg. in |SpatialPointsDataFrame| format). Yeah, I haven't come across any specific instances of prioritizing point data - I just thought it might handy to have. I guess it might be useful when prioritizing different populations - maybe this is done when prioritizing actions for reefs? If its not really used, we can just include the functionality but not devote a vignette to it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prioritizr/prioritizrdata/issues/1#issuecomment-280903534, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSRZLOEBhOK30ROIsHxkJOplQD-O0Snks5rd_mmgaJpZM4L3Uh3.

-- Richard Schuster

Liber Ero Postdoctoral Fellow Geomatics and Landscape Ecology Laboratory Department of Biology Carleton University Email: mail@richard-schuster.com http://richard-schuster.com/ Twitter: @RicSchuster ph: 250-635-2321

jeffreyhanson commented 7 years ago

Yeah, I think having Simon Linke on board would be brilliant.

Ok, I'll have a look for point data too.

Yeah, I agree having the functionality to ingest text files will be useful for getting people to try it out. Could you could post an issue in the prioritizr/prioritizr repo?

ricschuster commented 7 years ago

Simon is on board now and he will be able to provide us with a line dataset. He also has a great (i.e. big) line dataset for testing purposes we could use, but we can't make that available to others.

jeffreyhanson commented 7 years ago

Awesome! Yeah, it would be good to use a big vector dataset for benchmarking.

ricschuster commented 7 years ago

We also have a 2+ million planning unit polygon dataset we can use for benchmarking. That dataset prompted the switch from Marxan (2 days to find sub-optimal solution) to ILP (5 mins to find optimal solution) for me.

On 2017-02-21 20:26, Jeff Hanson wrote:

Awesome! Yeah, it would be good to use a big vector dataset for benchmarking.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prioritizr/prioritizrdata/issues/1#issuecomment-281565590, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSRZPTAugun9rKCiQMClKER2Mu29FFMks5re7kHgaJpZM4L3Uh3.

-- Richard Schuster

Liber Ero Postdoctoral Fellow Geomatics and Landscape Ecology Laboratory Department of Biology Carleton University Email: mail@richard-schuster.com http://richard-schuster.com/ Twitter: @RicSchuster ph: 250-635-2321

jeffreyhanson commented 7 years ago

Closing this issue because the package now has a couple of great data sets. If anyone has any other data sets they would like to contribute, please open another issue.