Open sckott opened 8 years ago
So the output of hoa_search
and hoa_gbif
is a list of two tibbles each containing a key
and some unstructured text. Is this correct? It would be nice to have a column of species names. I don't think the host information is available (which makes the name of the package a bit ironic, right?), but if I were to search at the genus level (e.g., hoa_search('Ixodes')
), I have no way of parsing out species identities. Also, gbif provides information on latitude and longitude of interaction, but I don't see this information in the output of hoa_search
. Also, it seems like there is pertinent information on sampling date and citation (for some occurrences) that could be included.
Without knowing information on host species, I see a couple possible use cases involving the mapping of parasite diversity (e.g., I search for a bunch of parasites, map out occurrence points, determine range area, and then overlay a bunch of parasite polygons to get a coarse idea of diversity) or species distribution modeling efforts (e.g., relating parasite occurrences to the host community from GBIF or IUCN data, along with climate data). If we had some labeled data on known host-parasite interactions (from Global Mammal Parasite Database maybe?), it'd be fun to see if we could reconstruct the plausible set of host species using just the parasite occurrence data. How sick would that use case be?!
So the output of hoa_search and hoa_gbif is a list of two tibbles each containing a key and some unstructured text. Is this correct?
yes!
It would be nice to have a column of species names. I don't think the host information is available
can definitely do that
Also, gbif provides information on latitude and longitude of interaction, but I don't see this information in the output of hoa_search. Also, it seems like there is pertinent information on sampling date and citation (for some occurrences) that could be included.
I attempted to give back just the relevant host columns with host/parasite/etc. info, - BUT importantly including the occurrence key, so that you can easily merge that info to the remainder of the occurrence data - But, perhaps I can return all data
I search for a bunch of parasites, map out occurrence points, determine range area, and then overlay a bunch of parasite polygons to get a coarse idea of diversity
thanks, sounds like a good use case
species distribution modeling efforts (e.g., relating parasite occurrences to the host community from GBIF or IUCN data, along with climate data).
nice, good one
If we had some labeled data on known host-parasite interactions (from Global Mammal Parasite Database maybe?), it'd be fun to see if we could reconstruct the plausible set of host species using just the parasite occurrence data. How sick would that use case be?!
What do you mean by "labeled"?
We can assume that we will know the host species, I'm about to add that in to the results.
I thought that the parasite occurrence data didn't include information on host species. I was thinking that if we had a set of data for which we knew the host species of a given parasite, would it be possible to train a model on georeferenced parasite occurrence points where the host is known (labeled occurrence data) to predict the likely host species of parasite occurrences where the host species was unknown (unlabeled occurrence data). It's a random thought, and is moot since the data contains information on host species identity.
I thought that the parasite occurrence data didn't include information on host species.
Sorry, just meant we can include the species name in the output, whatever is associated with the output of the hoasts
functions
would it be possible to train a model on georeferenced parasite occurrence points where the host is known (labeled occurrence data) to predict the likely host species of parasite occurrences where the host species was unknown It's a random thought, and is moot since the data contains information on host species identity.
Seems like a great use case. You said it's moot, but are there cases in which it's not moot?
@taddallas @qgroom
I already got some context for use case at https://github.com/ropensci/rgbif/issues/223 - but hoping for more use cases and and for people to try this out so we make sure that we're solving problems people actually have. Please do ping anyone else you think might be interested.
maybe @tomjwebb is interested ?