tdwg / interaction

Biological Interactions Data Interest Group
GNU General Public License v3.0
21 stars 5 forks source link

How can I define amimal diets using Biological Interactions? #25

Open karilint opened 2 years ago

karilint commented 2 years ago

Hi, The dietary information is used in many ecological (like food webs) and macroecological (community structures by dietary guilds) and paleontological (teeth vs. diets) studies.

Many scientific papers list animals and their diets. However, these studies are based on different methodologies and the results are reported in many ways. The general idea of these studies is to find out what animals eat and what are the proportions of the diet items. Currently, there is no standard way of reporting/sharing dietary data and I was wondering if the Biological Interactions Data interest group could help with this matter.

Quite often the diet composition data contains information like:

food items, life stages and parts consumed (verbatim and scientific names)

proportion, share or importance:

with a measurement method

the study time (similar to dwc:verbatimEventDate) and sampling effort (dwc:samplingEffort)

location of the analysed diet

with the name of the data source and a possible cited reference in the data source

These are only a few terms that relate to animal diets but I suppose that many dwc terms already could be used.

Would animal diets be one sample case for Biological Interactions?

jhpoelen commented 2 years ago

Hi @karilint - yes, I'd say that animal diets is an example of Biological Interactions. As you might know, many of your colleagues have captured diets in digital form, including the important details that you mentioned (e.g., lifestage, frequency, date ranges, body part consumed). You might get some inspiration by looking at the interaction datasets indexed by https://globalbioticinteractions.org/sources. In specific, you might want to review the rigorous approach that @ahhurlbert et al. are using the maintain and extend their Avian Diet Database https://github.com/hurlbertlab/dietdatabase .

Curious to hear what you come up with.

karilint commented 2 years ago

Hi @jhpoelen , thank you for those interesting links and examples. I myself have a large dataset of mammalian diets (similar to the Avian Diet Database). The globalbioticinteractions site seems to be very interesting and may provide a platform for the data I have. However, what ideally would be great is a common use of terms that could be used for describing animal diets, including the parts/life stages eaten and the proportions.

jhpoelen commented 2 years ago

@karilint many folks use:

Uberon for lifestages / body parts.

Relations Ontology for Biotic Interaction terms

For the proportions, I've seen various different measures (% stomach volume, stomach volume, relative frequency of occurrence, etc.) . I'd suggest to document what you have today, and then, time permitting standardize terms over time. This is to avoid analysis paralysis. https://en.wikipedia.org/wiki/Analysis_paralysis .

jhpoelen commented 2 years ago

By capturing all the details you need, and translating (where possible) to other formats (like DwC), you retain the original details, while also offering a more "standardized" perspective.

karilint commented 2 years ago

@jhpoelen I've been discussing with people that created the Ecological Traitdata Standard (ETS) https://github.com/EcologicalTraitData/ETS. It has most of the things I need (only a couple of terms missing). I have already 'documented' what I have but missed the standardisation part because of a lack of knowledge.

I got good suggestions from ETS people for finding/creating a vocabulary for animal diets (for example the Biological Interactions). My dataset is a compilation of diets for 4453 mammalian species, having 26849 rows of 'dietary items'. I'm more than happy to share a sample of the data if someone could help me with mapping the current terms and standardising the new ones. My aim is not just to publish the data set but also to enable sharing and updating it using standard terms.

jhpoelen commented 2 years ago

@karilint I like your idea to share a sample so other can chime in on what existing terms or datasets might be useful for you to look at.

If you'd like, I can help index the example by GloBI so that we can see how your rich datasets fits into the GloBI indexes.

karilint commented 2 years ago

Hi @jhpoelen , I created 2000 rows sample of the data set. It can be viewed as a Google Sheet at https://docs.google.com/spreadsheets/d/1rZGkI-lyKkKWNH3eMOOm-HYot2WGiZOT-qY7E811l2o/edit?usp=sharing

The basic idea is that many of the ETS vocabulary terms fit very well for my purpose (although the entities Traitdata and MeasurementOrFact are probably wrong). The other terms I have at the end of the file are vaguer: DWC_samplingEffort, DWC_verbatimEventDate, DWC_associatedReferences, GGBN_sequence, DWC_AssociatedTaxa, ABCDEFG_PartOfOrganism. For one, NO_NAME_verbatimAssociatedTaxa I have not found any comparable term.

So, based on the sample data, do you think there is a possibility to use the Biological Interactions and describe the data with more appropriate terms? I'm a bit out of my comfort zone here.

jhpoelen commented 2 years ago

@karilint apologies for the delay! I am still planning to look at this sooner rather than later. Please do poke me if you don't hear from me by the end of this week.

karilint commented 2 years ago

@jhpoelen excellent! I'm very grateful for your help.

karilint commented 2 years ago

Hi @jhpoelen , I've also been quite busy lately. For your information, our submitted manuscript on mammalian diets will be published within three months or so. Before I submit the last version of the manuscript, it would be great to have the terms in place so that I can use the correct ones for future data imports/exports. All the best!

karilint commented 1 year ago

Hi @jhpoelen , any new ideas on the matter?

jhpoelen commented 1 year ago

Apologies for the delay, and thank you for reminding me.

I had a look at your data sheet, at https://docs.google.com/spreadsheets/d/1rZGkI-lyKkKWNH3eMOOm-HYot2WGiZOT-qY7E811l2o/edit#gid=764931096 .

I noticed you use a wide table format: one row contains all the information you need; I like wide table formats because I don't have switch between table to do my analysis. Also, indexing wide tables are easier handle in GloBI.

But . . . standards like ETS and their cousin DwC-A are designed to put things in separate tables. Typically, you'd have separate files for occurrences, measurementOrFact, taxa, etc. Then these files would be referenced in meta.xml . This file describes the meaning of these tables and how they relate.

In my experience, ETS/DwC-A are pretty nice to exchange data with your colleagues, but for editing, and analysis the wide table format is a little easier to manage, especially when working with spreadsheet programs.

To get the best of both worlds, I'd suggest to use the wide format to do your own management and analysis. And include a description of the fields with some examples. Then, if you have time, you can transform the wide table format into a ETS package and include it in your publication.

With this approach, you can use terms that ETS / DwC-A do not have (yet) like verbatimAssociatedTaxa, and split out the verb of the interaction from their object (e.g., separate columns for the interaction type like "eats" from the object like "Termites").

In other words, I think you are trying to solve two different problems (manage/describe/capture data, exchange data) with one solution. This is pretty challenging, because managing data and exchanging data are very different beasts in my mind. To avoid this, I typically try to first solve one problem: capture the data in a form that works for me. Then, time permitting, I'd focus on the exchange part. And, if you don't have time, you can always add it later because you've captured the original data.

Hope this helps. If not, please holler and I'd be happy to go over during a video chat if you'd like.

karilint commented 1 year ago

Hi @jhpoelen , thank you for this information. I'll be on fieldwork for the next four weeks or so. I'll check this in more detail after it.