vmikk / PhyloNext

A pipeline for phylogenetic diversity analysis of GBIF-mediated data
https://phylonext.github.io
MIT License
10 stars 0 forks source link

Add list of contributing GBIF datasets to output directory #7

Closed thomasstjerne closed 1 year ago

thomasstjerne commented 1 year ago

In order to cite the filtered data used for a pipeline run, we would need a simple csv with two columns datasetKey and occurrenceCount. This will allow a user (or the web GUI) to register a derived dataset with a citable DOI. I suggest to add it to the pipeline_info output directory.

vmikk commented 1 year ago

Export of datasetKeys is available in the latest version now (implemented via f7668fac0b9e5670c5aa37da3865747f9c728e1d) Please run nextflow pull vmikk/phylonext to update the pipelene.

In the results, there should be a pipeline_info/Dataset_DOIs.txt file.

thomasstjerne commented 1 year ago

Thanks @vmikk , that was fast!

vmikk commented 1 year ago

Currently there are no occurrence counts per dataset, as it's a bit tricky to get it (there were multiple rounds of filtering and spatial aggregation per species)

thomasstjerne commented 1 year ago

The dois / keys are the most important - if the counts are tricky, then we´ll just close it with the dois only