inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
90 stars 21 forks source link

Example of Species Distribution Model for presence-only data in inlabru #102

Closed ManuelSpinola closed 1 year ago

ManuelSpinola commented 3 years ago

Is there any example or tutoría to fit lgcp to presence-only data to assess species distribution?

finnlindgren commented 3 years ago

You'd have to define what model you want to use for the presence-only data; An LGCP is a model for surveying a region, with either perfect detectability or an explicit model for the detectability, which isn't necessarily suited to presence-only information. I've seen people try a construction with "pseudo-absences" which mimics the numerical integration approach inlabru uses to implement LGCP models, but those pseudo-absence approaches did not correspond to a well defined observation model ,which lead to instabilities. For species distribution assessment, can you be more specific? There are some general tutorials on https://inlabru-org.github.io/inlabru/articles/

ManuelSpinola commented 3 years ago

Thank you very much Finn.

I am thinking of a replacement (and improvement) for maxent type of models for SDM from preferential sampling, like records taken from databases like GBIF (Global Biodiversity Environmental Facility).

Manuel

El mié, 17 feb 2021 a las 9:32, Finn Lindgren (notifications@github.com) escribió:

You'd have to define what model you want to use for the presence-only data; An LGCP is a model for surveying a region, with either perfect detectability or an explicit model for the detectability, which isn't necessarily suited to presence-only information. I've seen people try a construction with "pseudo-absences" which mimics the numerical integration approach inlabru uses to implement LGCP models, but those pseudo-absence approaches did not correspond to a well defined observation model ,which lead to instabilities. For species distribution assessment, can you be more specific? There are some general tutorials on https://inlabru-org.github.io/inlabru/articles/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/inlabru-org/inlabru/issues/102#issuecomment-780637074, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FB6FPNADEIC7IZXBERLS7POSLANCNFSM4XYQSDLQ .

-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola@una.cr mspinola@una.ac.cr mspinola10@gmail.com Teléfono: (506) 8706 - 4662 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/

finnlindgren commented 3 years ago

My collaborators are more experts on specific ecology terminology for these models than I am, so I'm afraid the acronyms themselves don't mean much, but given the mathematical definitions/details I can usually tell if it's something that's currently possible or not.

ManuelSpinola commented 3 years ago

Thank you very much Finn.

The GBIF database contains records from museums or occasional sightings of a species, so they are not recorded following a sampling protocol.

Manuel

El mié, 17 feb 2021 a las 10:53, Finn Lindgren (notifications@github.com) escribió:

My collaborators are more experts on specific ecology terminology for these models than I am, so I'm afraid the acronyms themselves don't mean much, but given the mathematical definitions/details I can usually tell if it's something that's currently possible or not.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/inlabru-org/inlabru/issues/102#issuecomment-780696225, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FB2MNWYXPUZWSBLSCXDS7PYB7ANCNFSM4XYQSDLQ .

-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola@una.cr mspinola@una.ac.cr mspinola10@gmail.com Teléfono: (506) 8706 - 4662 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/

joenomiddlename commented 3 years ago

Hi Manuel!

I have been working a lot with presence-only data within inlabru and can help you out.

All of the inlabru tutorials beginning 'lgcp_2D' tutorials apply to presence-only data but additional assumptions or knowledge are needed. (The tutorials are found here: https://github.com/inlabru-org/inlabru/blob/devel/vignettes/web)

So the first assumption or piece of knowledge that is needed, is a well-defined region/s where you can assume that your 'observers' have looked/searched in. For example, suppose you are modelling a museum's collection of sightings of a bird species within an ecological reserve (e.g. stored as a SpatialPolygons object called 'Reserve_sp'). If you can assume that observers could have had access to the entire reserve, then specifying the argument 'samplers=Reserve_sp' to the function lgcp() or(like()) could be suitable.

If, instead, you know that your observers could only search within known subregions of the ecological reserve (e.g. stored as a SpatialPolygons object called 'Reserve_sub_sp'). Then specifying 'samplers=Reserve_sub_sp' within the function lgcp() or(like()) would be more suitable

In either case, you would then be assuming that observers searched their 'samplers' for an equal amount of time (i.e. constant effort throughout the samplers). If you believe that the 'effort' from your observers is more heterogeneous across space, then you can try to model the heterogeneous effort through a set of covariates that you believe 'describe' the relative amounts of effort that take place through space (e.g. distance from the nearest road, population density, etc.,). This approach is described in detail in the papers: Fithian et al 2015 - "Bias correction in species distribution models: pooling survey and collection data for multiple species" (https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12242) and Watson et al 2020: https://arxiv.org/pdf/1911.00151.pdf.

I hope this helps! Let me know if you have any questions :)

Joe

ManuelSpinola commented 3 years ago

Thank you very much.

In the second example, did you mean "samplers = Reserve_sub_sp"?

I was looking at the tutorials. They are very nice. Is there an example with the combination of several variables, factors and continuous covariates?

Manuel

El mié, 17 feb 2021 a las 16:47, joenomiddlename (notifications@github.com) escribió:

Hi Manuel!

I have been working a lot with presence-only data within inlabru and can help you out.

All of the inlabru tutorials beginning 'lgcp_2D' tutorials apply to presence-only data but additional assumptions or knowledge are needed. (The tutorials are found here: https://github.com/inlabru-org/inlabru/blob/devel/vignettes/web)

So the first assumption or piece of knowledge that is needed, is a well-defined region/s where you can assume that your 'observers' have looked/searched in. For example, suppose you are modelling a museum's collection of sightings of a bird species within an ecological reserve (e.g. stored as a SpatialPolygons object called 'Reserve_sp'). If you can assume that observers could have had access to the entire reserve, then specifying the argument 'samplers=Reserve_sp' to the function lgcp() or(like()) could be suitable.

If, instead, you know that your observers could only search within known subregions of the ecological reserve (e.g. stored as a SpatialPolygons object called 'Reserve_sub_sp'). Then specifying 'samplers=Reserve_sp' within the function lgcp() or(like()) would be more suitable

In either case, you would then be assuming that observers searched their 'samplers' for an equal amount of time (i.e. constant effort throughout the samplers). If you believe that the 'effort' from your observers is more heterogeneous across space, then you can try to model the heterogeneous effort through a set of covariates that you believe 'describe' the relative amounts of effort that take place through space (e.g. distance from the nearest road, population density, etc.,). This approach is described in detail in the papers: Fithian et al 2015 - "Bias correction in species distribution models: pooling survey and collection data for multiple species" ( https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12242) and Watson et al 2020: https://arxiv.org/pdf/1911.00151.pdf.

I hope this helps! Let me know if you have any questions :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/inlabru-org/inlabru/issues/102#issuecomment-780905154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FBY44QPYEOJ4ZBG7CT3S7RBRDANCNFSM4XYQSDLQ .

-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspinola@una.cr mspinola@una.ac.cr mspinola10@gmail.com Teléfono: (506) 8706 - 4662 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/

joenomiddlename commented 3 years ago

Hi @ManuelSpinola Good catch! I certainly did and I have updated it accordingly :)

So the tutorial: https://github.com/inlabru-org/inlabru/blob/devel/vignettes/web/2d_lgcp_covars.Rmd provides a good example for a factor variable.

If you have a continuous covariate (e.g. 'elevation'), simply define it as a SpatialPixelsDataFrame object in the same coordinate reference system (e.g. elev_sp) as your sightings data (of class SpatialPointsDataFrame). Then, when defining your components, specify name_of_elevation_component(main=elev_sp, model='linear').

where name_of_elevation_component is your desired name of the elevation component.

Adding multiple covariates simply requires adding additional terms to the components and formula of your lgcp!

NOTE: I just noticed there is an elevation variable added in the HTML link above. There an elevation function was defined instead (lines 346-389). Alternatively, the approach above can be used (see Finn's comments): "The elevation variable here is of class 'SpatialGridDataFrame', that can be handled in the same way as the vegetation covariate"