trias-project / indicators

📈 Alien species indicators
https://trias-project.github.io/indicators/
MIT License
0 stars 1 forks source link

Checklist data filters for indicators #21

Closed peterdesmet closed 5 years ago

peterdesmet commented 6 years ago

This issue describes how data will be filtered to get to the data frame described in #17. Some of these will be tackled in the unified checklist.

Filters:

  1. Specific checklists
  2. Distribution in Belgium
  3. occurrenceStatus = present
  4. establishmentMeans = introduced
  5. Linked to taxonomic backbone. This removes valid taxa, but is necessary for higher classification
  6. Remove genera and above (i.e. no species info, is the case for RINSE)
  7. Group by accepted taxon?? So we don’t count synonyms of same species.
  8. Group by species?? So we don’t count infraspecific taxa
damianooldoni commented 6 years ago

Thanks @peterdesmet. I was already doing steps from 1. to 5. About 6th point:

Remove genera and above (i.e. no species info, is the case for RINSE)

In case we have no species info for some checklists, then we start to count them from the rank they have (genus in case of RINSE, as no species info is provided). Unfortunately I don't understand what you mean with remove genera and above.

About 7th point:

Group by accepted taxon?? So we don’t count synonyms of same species.

I agree with you. And considering the fact that we are working based on a unified checklist (fictive at this moment), the problem of having two synonyms of the same species from two different checklists with different distribution/description will never occur! This problem will be tackled by unifying the checklists.

About 8:

Group by species?? So we don’t count infraspecific taxa

Yes, for sure. No group_by for subspecies. But how do we tackle the case shown here below?

rank taxonKey scientificName speciesKey first observed last_observed ...
SUBSPECIES 111 subspecies1 200 2006 2010 ...
SUBSPECIES 112 subspecies2 200 2008 2015 ...

My idea is to take the minimum between the first_observed and the maximum of last_observed if the two periods overlap.

damianooldoni commented 6 years ago

About 3rd filter:

occurrenceStatus = present

@timadriaens and @qgroom : should we filter by occurrenceStatus = present as @peterdesmet wrote or should we also consider occurrenceStatus == doubtful? For example, in Manual of Alien Plants in Belgium there are 31 taxa with occurrenceStatus == doubtful from 28 different species:

species

Elatine alsinastrum Myricaria germanica Centaurea alba Pilosella brachiata Potentilla argentea x inclinata Prunus fruticans Linaria simplex Triticum monococcum Triticum turgidum Triticum aestivum Sporobolus virginicus Hornungia procumbens Chenopodium preissmannii Amsinckia intermedia Epilobium novae-civitatis Epilobium interjectum Picea abies Pinus sylvestris Pinus rigida Pinus pinaster Galium rubioides Verbascum interjectum Hemerocallis lilioasphodelus Symphoricarpos microphyllus x orbiculatus Cerastium arvense subsp. arvense x tomentosum Cherleria laricifolia Narcissus pseudonarcissus Narcissus incomparabilis

Or do we go even more relaxed on this constraint by taking into account all taxa with occurrenceStatus != absent (not absent)? Click here to know more about the meaning of this terms. By the way, up to now I have encountered only taxa with status present or doubtful.

damianooldoni commented 6 years ago

Another consideration about 7th point: there are 363 synonyms in the checklists Manual of Alien Plants, alien macroinvertebrates and non-native freshwater fishes, whose 353 only in the Manual of the Alien Plants. After filtering only the IAS in Belgium, as explained above, 359 are left. It would be not a problem at all if their accepted (and relative acceptedKey) point to taxa not present in these checklists. However, there is a group of 27 taxa whose acceptedKey is present as key: this means that these species could be count twice. It is a small group, but we should tackle this problem. The point is: should it be tackled while unifying the checklists or while building indicators?

qgroom commented 6 years ago

Some of your doubtful species are certainly present e.g. Pinus sylvestris, Narcissus pseudonarcissus, so perhaps it is doubtful that they are naturalized. For P. sylvestris the checklist says it is doubtful only for Brussels. He must mean it is doubtfully naturalized, but cause I'm 100% sure it is present.

Picea abies, Pinus rigida and Pinus pinaster are probably similar cases. They are planted, but it is doubtful if they are escaping.

For N. pseudonarcissus the checklist is only refering to Narcissus pseudonarcissus L. subsp. major . Not N. pseudonarcissus in general. It is an important distinction as N. pseudonarcissus is a native species, but not this subspecies.

The rest are probably genuinely doubtful species. That is they are doubtfully present.

peterdesmet commented 6 years ago

@damianooldoni regarding synonyms for which the accepted taxon is ALSO in the checklist, we'll have to take the following approach in building the unified checklist:

Merging that information is no different than merging information for the same (accepted) taxon appearing on two checklists... we just haven't decided yet HOW we will merge that info. 😄

damianooldoni commented 5 years ago

Based on developments of unified checklist, we can be sure we group by verificationKey. We can then close this issue.