replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 16 forks source link

Filter/ trim question #161

Closed omarkr8 closed 2 years ago

omarkr8 commented 2 years ago

Ive had some samples with consensus sequences 80% trimmed off, but going into the BAM files, there is some coverage throughout (but very little, this was a bad sample about 2k reads).

1) which options can I use to adjust the trimming? 2) are the options listed in --help exhaustive?

thanks

replikation commented 2 years ago

hi not sure what the question is. you mean sequencing depth?

--min_depth nucleotides below min depth will be masked to "N" [default 20]

omarkr8 commented 2 years ago

ah no i dont think it is a mindepth issue, because this specific round had mindepth 1.

i could send the consensus and alignment files, but they are 1gb +. let me try to elaborate...

lets say I go into the consensus for a sample, i see strings of continous Ns. If i look into the alignment files, some of these will be due to zero coverage in that region. But sometimes I investigate the positions and find 5-10 decent reads; but they are still expressed as chunks of Ns in the consensus.

I would imagine that min_depth would not cause this issue. and i do not know why decent reads would get masked. if you are still curious to see the data, i can send it over.

replikation commented 2 years ago

how does the alignment file look like (in the results the covarplot). or in other words you are sure that these "alignments" are not the primers that get trimmed later on?

replikation commented 2 years ago

can you give me the full command? and what alignment files do you mean exactly?

omarkr8 commented 2 years ago

the command I use is: nextflow run replikation/poreCov --fastq_pass inputfolder --samples sample_names.csv \ --primerV V3 --min_depth 1 --output outputfolder\ -profile local,docker -r 0.9.5

the alignment files im refering to are the sample.primertrimmed.sorted BAM genome assemblies in 2.Genomes folders. the reason im interested in those BAMs is to check which reads were used to generate the consensus. Correct me if im wrong, but the only fastq/a output the pipeline offers are in the filtered fastq, the consensus, and those BAMs.

replikation commented 2 years ago

the bam files are also visualized in the results dirs under genomes (the covarplot) this also gives a good representation in regards to depth /aligned reads (but you checked the primer trimmed bam - so its not just aligned priemrs). it could be that the artic assembly is removing the reads based on length or quality but this question i cannot answer.

you could check out the process dir of artic. this contains all the intermediate steps. (check the workdir and the number at the process in the terminal to locate the artic process dir:

e.g. something like this for the folder c7/d2ad3f* : [c7/d2ad3f] process >

omarkr8 commented 2 years ago

I appreciate the insight. I did wonder if the artic pipeline had something to do with this. thank you for your patience.