jtklein / 2016GBIFchallenge

Apache License 2.0
2 stars 1 forks source link

What backend data can we get #1

Open AugustT opened 8 years ago

AugustT commented 8 years ago

Hi Guys,

This is the backend data I can generate.

1) For a group (e.g. butterflies) a map of recording effort (where has most effort been spent recording the group). This can be used to direct people to areas where more records are need

2) For each species in a group a map of the predicted distribution (i.e. where I think it should be)

3) For each species a map of 'high value recording areas'. These are areas where I think it is likely to be but where there are no records yet.

I will use a method called Frescalo to undertake this analysis using data read in from GBIF and for the purposes of the MVP we should focus on Great Britain as I have the underlying habitat data needed for the analysis. All maps would be at a fairly low resolution (10km*10km squares) due to the resolution of the data and the computing power required to run the analysis.

AugustT commented 8 years ago

@jtklein are you happy for me to create a new folder for the work that I do?

AugustT commented 8 years ago

I also suggest we don't do butterflies, for various reasons. Perhaps we could have a look at the data on GBIF and see what looks like a good group. Coccinellidae could be a good one

jtklein commented 8 years ago

@AugustT To number 1) Is it feasible to do this for all the data in GBIF, not only a selected group? To get the PokemonGo feeling, I think it would be nice if it was as fine scaled as possible. I don't want to see that I have to go 5km until I reached the spot to record data. I think some 100m are alright. So that on client-side I access the location and the server pushes me the region with the least data within 300m or so.

AugustT commented 8 years ago

1) Not feasible due to the computational intensity and hands on data wrangling needed. Fine scale should aways be the aim but often the data does not support this.

AugustT commented 8 years ago

Sorry I have not been contributing. I did some work today

1) I have added an example data set for Quercus rubra. All records I have extracted have an accuracy of 10m or better. I think this hits the level you were after. This is in the R folder and is a .csv. I think we have removed a lot of UK records by requiring 10m accuracy so might do this again with a less strict requirement.

2) I have used this data to create a prediction map of the global distribution of this species. This uses the highest resolution data I cold get from Bioclim (2.5 degree resolution). The model to create this took about 10min to run. The model is not particularly well designed, I just threw all the Bioclim layers at a random forest model. I'm not sure what format you need this data in, I have saved it as a geotiff.

AugustT commented 8 years ago

PR coming soon