trias-project / daisie-checklist

🇪🇺 DAISIE - Inventory of alien species in Europe
https://trias-project.github.io/daisie-checklist/
MIT License
0 stars 2 forks source link

Clean eventDate data #16

Open LienReyserhove opened 5 years ago

LienReyserhove commented 5 years ago

When inspecting date information, the following dates are odd:

  1. Negative start_year for 174 records and starting from the year -7000 (!)
  2. Negative end_year for 124 records and starting form the year -6000
  3. For 29 distributions: the start year occurs after the end year (and not because I misinterpret the negative values :-) )

With respect to 1 and 2, I can hardly imagine these species to be alien as the are introduced many many years ago. I would suggest to leave eventDate information empty for the records in 1, 2 and 3. This only affects about 200 distributions (a total of 56000 distributions = 0.35%)

LienReyserhove commented 5 years ago

Some intensive cleaning is needed for the dates. On request of @stijnvanhoey and @damianooldoni , I made a branch eventDate_mapping. The file can be found here It appears to be empty, but it's not :-) I will try to figure out how to clean them myself, but help is always welcome (and already provided thanks to @stijnvanhoey)

DavidRoy commented 5 years ago

I suspect this part of DAISIE was not well reviewed when being collated. I think we need some rules to exclude data that does not make sense. I suggest at least the following:

  1. exclude any dates <0
  2. exclude dates when end_year < start_year
qgroom commented 5 years ago

Given the long discussions we've had about scope I don't suppose these records can be left out altogether. I really hate unbounded records. They tend to get interpreted as the taxon always being present, whereas frequently the reverse is true. If you feel you have to be included them, then you could use the occurrenceStatus of doubtful, which seems appropriate in these cases.