Closed LienReyserhove closed 6 years ago
references
Brussels-Capital Region
, Flemish Region
, Walloon Region
: probably best to do this in script, to keep spelling simple in sourceSoutheast Asia
appears twice, probably because of ?
. Best to perform a trim
on all valuesQuestions
|
without pipes? I noticed one instance where the space was missing (now corrected in source data).taxonID
sUN geoscheme
in the locality?Data issues:
subspecies
and Haplodrassus
should be genus
. I've corrected this in the source data2005-2009/2018
Lacerta viridis (Laurenti, 1768) / bilineata
to be correctedTo reply:
I would remove
references
I would not. I think we really need this as we don't have a good, short identifier for each species (e.g. when we don't have a doi available). This is then the most complete and best information we have.
Is the script able to parse
|
without pipes? I noticed one instance where the space was missing (now corrected in source data)
Do you mean "without spaces"? If so, yes, it works even when a space is missing. You can try it with this script (where you can add or remove spaces between test1 and test2
data_frame <- as.data.frame(matrix(c("test1|test2", "test3|test4" ),
dimnames = list(1:2, "test")))
separate(
data = data_frame,
col = "test",
into = c("column_A", "column_B"),
sep = "\\|")
What happens if two records of the same species (because of different distribution) differ in e.g. order? Are two taxa created? If so, I would double check if there are no duplicate
taxonID
s
In the case of one single species with two distribution records, the code will create only one taxonID for that species and the taxon core will contain only one record for that taxon.
Should I see any info regarding the
UN geoscheme
in the locality?
We could do that, but I'm not 100% fond of this, as the mapping will then be a combination between several standards or no standard at all. I would keep it limited to the WGSRPD vocabulary
As discussed: agree on all, I would just drop the field references, but keep the extension references.
Agree for the references!
@peterdesmet ready for review! You will need to run the script again. Due to the changes in the taxon core (removal of references), I was not able to exclude changes in UTF-8 encoding in the processed files.
Nice! Merge away 🚢
This is a first attempt to map the adhoc checklist data. @peterdesmet now ready for review! This is an overview of all mapped DwC terms, remaining questions and issues
Record level terms
References: See #12 For now, I used the identifiers, not the full bibliographic citation. For the full bibliographic citations, I used the literature references extension. However, many identifiers are lacking, so the mapping is still unfinished and has to be revised when all information is available.
Taxon core
vernacularNameLiterature reference extension:
For now, this extension only contains the full bibliographic citations. I did not include the identifiers yet. Whether or not the identifiers can/will be integrated depends on the raw data. See #12
Distribution extension:
The fields
locationID
andlocality
are based on the information inlocation
in the raw GS file. Not all fields were populated. I assumed that in these cases,locality = Belgium
Species profile extension
Description extension
Native range
Some basic cleaning steps were performed in the raw data (e.g. use
|
as a separator). Other cleaning steps (e.g. change lower case to capital) were performed in the R-script.Not all terms match the WGSRPD vocabulary. Some of these could be mapped to the UN geocheme. However, some of the terms were not listed in any of these vocabularies, so I left them as they are for now. Suggestions are welcome.
Mapped to WGSRPD standard:
Mapped to or matches with UN geoscheme:
Doesn't match any of the vocabularies above:
Pathway of introduction
Data were already mapped to the CBD standard
Invasion stage
see #16