wtsi-team112 / ncov2019-artic-nf

A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019 - adapted to run at the Wellcome Sanger Institute
0 stars 4 forks source link

Rename input files to match sample names #4

Closed roamato closed 4 years ago

roamato commented 4 years ago

Input files will be in the form run_id#index and contain the sample name in the header @SM tag. Output files have the same name as inputs but for upload they are expected to be named per sample. Either a command line option --rename or as part of the --cram one, files are renamed before being processed by the first step of the workflow (cram to fast). One alternative option is, at the very beginning, to create symlinks from run_id#index.cram to sample_id.cram, then resume normal execution.

roamato commented 4 years ago

A very crude way to extract the SM tag is: samtools view -H ${FILE} | grep 'SM:' | cut -f 7 | cut -f 2 -d ':'

However this doesn't work if the tag is not in the 7th column.

roamato commented 4 years ago

The following is sligtly more robust: samtools view -H ${FILE} | tr '\t' '\n' | sed -n -E 's/SM:(.+)/\1/p'

roamato commented 4 years ago

On hold, DNAp is currently providing symlinks directly