replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 16 forks source link

Optional "description" column in report TSV/Excel #136

Closed hoelzer closed 2 years ago

hoelzer commented 3 years ago

This might be a kinda specific request but maybe it's of general use and easy to add:

We would like to add another column w/ IDs to the final report files (mainly the Excel) because this is needed for reporting. Of course, this can be done w/ some script that runs after poreCov to update the TSV/Excel file. However, it might be interesting to extend the samplesheet input by another optional column (e.g. description) that, if available, is added as an additional column to the TSV/Excel.

e.g.:

_id,Status,Description
IMSS-00157,barcode01,fancy-sample

and in the report then

IMSS-00157    fancy-sample    89%N    B.1.1.7    etc...

What do you think? We can also add this on our site outside of the poreCov functionality if you dont see the general use. Such a description column would also allow to still have the barcode information in the output, if needed, by defnining as the samplesheet input

_id,Status,Description
IMSS-00157,barcode01,barcode01
replikation commented 3 years ago
hoelzer commented 3 years ago

Yeah, for example. Or any other information. This came up because we use internally different ID sets. Currently, the samplesheet input only allows for a mapping between the barcode and one ID. Now, we run a separate script to add a second ID that is needed during reporting. Thus, I had the idea that we could provide an optional "description" column in the samplesheet that can be for example used for such a purpose.

hoelzer commented 3 years ago

We have this now implemented by running some separate script but maybe this would be of general use for others as well

replikation commented 2 years ago

ping @RaverJay

RaverJay commented 2 years ago

I see two ways to do it:

  1. existing samples_input_ch is modified somehow to also propagate a 'Description' column if it is there
  2. giving the whole samples csv to the report process and parsing that myself

First would be more consistent, second easier I think (maybe there are edge cases)