osmose-model / osmose-web-api

Web service that generates Osmose configuration files from data sources like Fishbase and SeaLifeBase. Used by https://www.config.osmose-model.org .
MIT License
2 stars 2 forks source link

Making sure to provide as few NAs as possible for background functional groups #114

Closed agruss2 closed 7 years ago

agruss2 commented 7 years ago

@jhpoelen @Dengaloo The “osm_param-ltl.csv” file provides estimates for three parameters: “plankton.size.min.plkX”, “plankton.size.max.plkX” and “plankton.TL.plkX”. The default values of these three parameters (which are specified in “fishbase-mapping.csv”) are problematic. Indeed, it is impossible to specify a default value for these three parameters for all the fish and invertebrates recorded in FishBase and SeaLifeBase. To solve this issue, we have two options:

(Option 1) Setting the default value of the “plankton.size.min.plkX”, “plankton.size.max.plkX” and “plankton.TL.plkX” parameters to “NA” (not available) in the “fishbase-mapping.csv” file.

Or

(Option 2) Implementing a rule of thumb in the API such that, if the API is unable to define a value for one parameter (for example “plankton.size.min.plkX”) for a given functional group, it uses information for the other species belonging to the genuses, families and orders considered in the functional group. (These other species can be found anywhere across the globe). For instance, let’s say that: (i) We are dealing with the Iceland Shelf/Sea ecosystem and that the bridge between FishBase/SeaLifeBase and OSMOSE defines a functional group called “demersalmollusc” for us. (ii) We choose to define “demersalmollusc” as a “background” functional group. (iii) The “demersalmollusc” functional group includes three species: Falcidens thorensis (Aplacophora), Prochaetoderma clenchi (Aplacophora) and Micropilina minuta (Monoplacophora). (iv) Estimates are not available for Falcidens thorensis, Prochaetoderma clenchi and Micropilina minuta to define a value for the parameter “plankton.size.min.plkX” for the “demersalmollusc” functional group. (v) Then, the API: identifies the genuses, families and orders to which Falcidens thorensis, Prochaetoderma clenchi and Micropilina minuta belong; and uses the information available for all the species that belong to the genuses, families and orders to which Falcidens thorensis, Prochaetoderma clenchi and Micropilina minuta belong to define a value for the parameter “plankton.size.min.plkX” for the “demersalmollusc” functional group.

Personnally, I prefer Option 2. We do not want to provide a lot of NA’s to the user; this would be frustrating to them. Plus I am not sure that the API can handle something else than numbers regarding the default values taken from the “Default value” column of the “fishbase-mapping.csv” file. @jhpoelen Is that correct?

But then, if we choose Option 2, how should the estimates be computed? I was thinking that the best option in the case presented above would be to: (i) consider all the minimum size estimates available for the species that belong to the genuses, families and orders to which Falcidens thorensis, Prochaetoderma clenchi and Micropilina minuta belong; and (ii) produce a mean estimate from all the available estimates.

Please let me know what you think. After I have received your input, I will close the present issue and will create new API issues. Many thanks.

jhpoelen commented 7 years ago

I'd say that option 1 and 2 are not mutually exclusive: explicitly stating that default for mentioned plankton properties as not available "NA" does not excluding being able to derive the properties from species beyond the current functional group species list provided by the UI.

I've implemented option 1. Please see my comment in #115 for discussion around option 2.

agruss2 commented 7 years ago

@jhpoelen This comment is very similar to the last comment I made in #115 Here is what I think we should do. Let’s say that we are dealing with a background group called “cephalopods” and the parameter “trophic level” or “TL”. (1) For that “cephalopods” functional group, the UI has identified, say, 20 species. (2) Before passing information to the API, the UI appends to these 20 species: all the species that belong to the genuses, families and orders of the 20 species. Let’s say that this results in a list of 500 species in total (20 species + 480 appended species). (3) When the API receives the list of 500 species from the UI, it first considers only the 20 species that belong to the “cephalopods” functional group. -> (i) Case 1: data are available to derive a value for the parameter “TL” for some of the 20 species that belong to the “cephalopods” functional group. In that case, the “data richness” rule described in #59 is applied to derive a parameter estimate for the parameter “TL” for the “cephalopods” functional group. -> (ii) Case 2: data are not available to derive a value for the parameter “TL” for none of the 20 species that belong to the “cephalopods” functional group. In that case, the API will consider the 480 appended species to see if it can derive a value for the parameter “TL”. (4) If we have faced Case 2 above, then: -> Case 1: data are available to derive a value for the parameter “TL” for some of the 480 appended species. In that case, the “data richness” rule described in #59 is applied to the list of 480 appended species to derive a parameter estimate for the parameter “TL” for the “cephalopods” functional group. -> (ii) Case 2: data are not available to derive a value for the parameter “TL” for none of the 480 appended species. In that case, the API will provide the value NA for the parameter “TL” for the “cephalopods” functional group.

Does this sound good to you? If so, could you please interact with @FIN-casey and @FIN-JBarile to implement what is described above?

jhpoelen commented 7 years ago

See https://github.com/jhpoelen/fb-osmose-bridge/issues/115#issuecomment-306239311 .

agruss2 commented 7 years ago

@FIN-casey @FIN-JBarile Please implement what is described above; and let me know when you think this issue has been solved so that I can run a new test with the bridge between FishBase/SeaLifeBase and OSMOSE. Many thanks.

FIN-casey commented 7 years ago

I have the same comment in 115.

agruss2 commented 7 years ago

@FIN-casey @jhpoelen I made the same comments in #115 :

@FIN-casey Here are my answers to your comments: (1) I think that it would be better not to have a maximum number of species for the extended list for every functional group; this way, we would maximize the probability to obtain a value for parameters. However, I guess that, for computional and practical reasons, you may want me to decide of a maximum number of species for the extended list for every functional group. Is this the case? If so, then please let me know and I'll provide you with a maximum number. (2) You are right that we should think of a way to distinguish between the original list of species and the list of additional species. I think that distinguishing between "taxa" and "othertaxa" is an excellent idea. This way, the API will be able to work first with "taxa" only; and will then consider "othertaxa", if need be.

@jhpoelen Please see my comments above. Could you please make sure that the "functional_groups.csv" file that is provided in the "osmose_config.zip" file lists only "taxa" (and, therefore, does not provide any information about "othertaxa").

jhpoelen commented 7 years ago

@agruss2 it seems to me that issues #115 and #114 are resolved. The recently discussed topics related to the maximum group length and taxon selection criteria seem related, but probably worthy of their own issue considering the cross posting of comments.

agruss2 commented 7 years ago

@jhpoelen I am not sure if the present issue is resolved. I made another test today, where I queried parameter estimates for Iceland Shelf/Sea ecosystem, and I still obtain a lot of NAs for different parameters; please have a look at the files stored in this zip file: osmose_config.zip

Please let @FIN-casey and me know what you think. Many thanks.

jhpoelen commented 7 years ago

If seems that the list of extended taxa is not sent from UI to API. See https://github.com/jhpoelen/fb-osmose-bridge/issues/115#issuecomment-322600357 .

FIN-casey commented 7 years ago

See my comment on #115

jhpoelen commented 7 years ago

See my comment in #115. Does not appear to be an API issue. Changing label, and associated assignments.

FIN-casey commented 7 years ago

Same comment in #115.

agruss2 commented 7 years ago

@FIN-casey @jhpoelen I just ran a test and I can confirm that the implementation is successful; many thanks Casey! Therefore, I am going to close the present issue. However, please note that I see some potential areas of improvement. Therefore, I will soon create new GitHub issues to address some minor problems.

agruss2 commented 7 years ago

@jhpoelen @FIN-casey I am reopening this issue, because I am still having a few NAs here and there for background functional groups. Therefore, I think that, for each background functional group, we should increase the number of species that the bridge between FishBase/SeaLifeBase and OSMOSE considers in addition to the species making up the background functional groups. How many additional species does the bridge currently consider for each background functional group? How much could we increase that number without making the bridge crash? Many thanks.

FIN-casey commented 7 years ago

@agruss2 there is no restriction on the number of additional species. I cant identify how much is the maximum number that the wizard can handle. Maybe @jhpoelen has an idea.

Same comment on #115.

jhpoelen commented 7 years ago

I haven't done any performance tests to check how long a species list can be. I do imagine the performance depends on the data richness of the species in the list: the more data, the faster the creation of the generation of related osmose configuration will be.

I'd like to suggest two things:

  1. close issues #115 and #114 (this issue) - the issues are the same, as seen in the duplication of comments.
  2. open a new, specific issue for the NAs - the issue describes specifically step-by-step instructions on how to reproduce what @agruss2 is seeing . Also, the issue would include an expected behavior with specific description (e.g., silly example: I expect a value for ear size of fish of functional group with species mickey, donald and minni.).