Closed dhoogest closed 2 years ago
Okay I've added another test sample which was used in an original issue for validating the dropped chimera R code. With this sample included in the test-single
set, the chim_dropped.csv file is confirmed to contain seqs and weights.
dhoogest@gattaca:~/src/dada2-nf$ ./nextflow run main.nf -params-file params-ngs16s.json
...
dhoogest@gattaca:~/src/dada2-nf$ xsv table output-single/dada/624-27/counts.csv
sampleid filtered_and_trimmed denoised_r1 denoised_r2 merged no_chimeras
624-27 5197 5068 4906 4597 4235
dhoogest@gattaca:~/src/dada2-nf$ xsv table output-single/dada/624-27/chim_dropped.csv
weight sequence
120 CAGGCTTAACACATGCAAGTCGTGGGGCAGCGGATACTTAGCTTGCTAAGTATGCCGGCGACCGGCGCACGGGTGAGTAACGCGTACCGAACCTGCCCATCACACAGGGATAGGCTTGCGAAAGCAAGATTAATACCTGATGGTCTCAGTTGTATGCATGTATAATTGAGTAAAGCCTTTGGGTGGTGATGGATGGCGGTGCGTCCCATTAGGAAGTTGGCGGGGTAACGGCCCACCAATCCTTCGATGGGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
80 CAGGCTTAACACATGCAAGTCGTGGGGCAGCGGATGCTTAGCTCGCTAAGTATGCCGGCGACCGGCGCACGGGTGAGTAACGCGTACCGAACCTGCCCATCACACAGGGATAGGCTTGCGAAAGCAAGATTAATACCTGATGGTCTCAGTTGTATGCATGTATAATTGAGTAAAGCCTTTGGGTGGTGATGGATGGCGGTGCGTCCCATTAGGAAGTTGGCGGGGTAACGGCCCACCAATCCTTCGATGGGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
49 CAGGCTTAACACATGCAAGTCGAGGGGAAACGACGGGGAAGCTTGCTTCCCCGGGCGTCGACCGGCGCACGGGTGAGTAACGCGTATCCAACCTGCCTCTGACTAAGGGATAACCCGGCGAAAGTCGGACTAATACCTTATGGCATCGTCTGCGGGCATCCAACGACGATTAAAGATTCATCGGTCAGGGATGGGGATGCGTCTGATTAGCTTGTTGGCGGGGTAACGGCCCACCAAGGCGACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
25 CAGGCTTAACACATGCAAGTCGAGGGGAAACGACATTGAAGCTTGCTTCGATGGTCGTCGACCGGCGCACGGGTGAGTAACGCGTATCCAACCTGCCTCTGACTGAGGGATAACCCGTCGAAAGTCGGCCTAATACCTCATGGCATCGTCTGCGGGCATCCAACGACGATTAAAGATTTCATCGGTCAGGGATGGGGATGCGTCTGATTAGCTAGTTGGCGGGGTAACGGCCCACCAAGGCTACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
20 CAGGCTTAACACATGCAAGTCGAGGGGAAACGACATTGAAGCTTGCTTCGATGGTCGTCGACCGGCGCACGGGTGAGTAACGCGTACCGAACCTGCCCATCACACAGGGATAGGCTTGCGAAAGCAAGATTAATACCTGATGGTCTCAGTTGTATGCATGTATAATTGAGTAAAGCCTTCGGGCGGTGATGGATGGCGGTGCGTCCCATTAGGAAGTTGGCGGGGTAACGGCCCACCAATCCTTCGATGGGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
19 CAGGCTTAACACATGCAAGTCGAGGGGAAACGACATTGAAGCTTGCTTCGATGGTCGTCGACCGGCGCACGGGTGAGTAACGCGTATCCAACCTGCCTCTGACTGAGGGATAACCCGTCGAAAGTCGGCCTAATACCTCATGGCATCGTCTGCGGGCATCCAACGACGATTAAAGATTTCATCGGTCAGGGATGGGGATGCGTCTGATTAGCTAGTTGGCGGGGTAACGGCCCACCAAGGCGACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
12 CAGGCTTAACACATGCAAGTCGTGGGGCAGCGGATACTTAGCTTGCTAAGTATGCCGGCGACCGGCGCACGGGTGAGTAACGCGTACCGAACCTGCCCATCACACAGGGATAGGCTTGCGAAAGCAAGATTAATACCTGATGGTCTCAGTTGTATGCATGTATAATTGAGTAAAGCCTTCGGGCGGTGATGGATGGCGGTGCGTCCCATTAGGAAGTTGGCGGGGTAACGGCCCACCAAGGCGACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
11 CAGGCTTAACACATGCAAGTCGTGGGGCAGCGGATACTTAGCTTGCTAAGTATGCCGGCGACCGGCGCACGGGTGAGTAACGCGTATCCAACCTGCCTCTGACTGAGGGATAACCCGTCGAAAGTCGGCCTAATACCTCATGGCATCGTCTGCGGGCATCCAACGACGATTAAAGATTTCATCGGTCAGGGATGGGGATGCGTCTGATTAGCTAGTTGGCGGGGTAACGGCCCACCAAGGCTACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
10 CAGGCTTAACACATGCAAGTCGAGGGGAAACGACGGGGAAGCTTGCTTCCCCGGGCGTCGACCGGCGCACGGGTGAGTAACGCGTACCGAACCTGCCCATCACACAGGGATAGGCTTGCGAAAGCAAGATTAATACCTGATGGTCTCAGTTGTATGCATGTATAATTGAGTAAAGCCTTTGGGTGGTGATGGATGGCGGTGCGTCCCATTAGGAAGTTGGCGGGGTAACGGCCCACCAATCCTTCGATGGGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
6 CAGGCTTAACACATGCAAGTCGTGGGGCAGCGGATACTTAGCTTGCTAAGTATGCCGGCGACCGGCGCACGGGTGAGTAACGCGTATCCAACCTGCCTCTGACTGAGGGATAACCCGTCGAAAGTCGGCCTAATACCTCATGGCATCGTCTGCGGGCATCCAACGACGATTAAAGATTTCATCGGTCAGGGATGGGGATGCGTCTGATTAGCTAGTTGGCGGGGTAACGGCCCACCAAGGCGACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
3 CAGGCTTAACACATGCAAGTCGAGGGGAAACGACATTGAAGCTTGCTTCGATGGTCGTCGACCGGCGCACGGGTGAGTAACGCGTAAAGAACTTGCCTCTTAGACCGGGACAACATCTGGAAACGGATGCTAATACCGGATATTATGGTTTTTTCGCATGGAGGAATCATGAAAGCTAGATGCGCTAAGAGAGAGCTTTGCGTCCCATTAGCTAGTTGGTGAGGTAACGGCCCACCAAGGCAATGATGGGTAGCCGGCCTGAGAGGGTGAACGGCCACAAGGGGACT
3 CAGGCTTAACACATGCAAGTCGTGGGGCAGCGGATGCTTAGCTTGCTAAGTATGCCGGCGACCGGCGCACGGGTGAGTAACGCGTACCGAACCTGCCCATCACACAGGGATAGGCTTGCGAAAGCAAGATTAATACCTGATGGTCTCAGTTGTATGCATGTATAATTGAGTAAAGCCTTCGGGCGGTGATGGATGGCGGTGCGTCCCATTAGGAAGTTGGCGGGGTAACGGCCCACCAAGGCGACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
2 CAGGCTTAACACATGCAAGTCGAGGGGAAACGACATTGAAGCTTGCTTCGATGGTCGTCGACCGGCGCACGGGTGAGTAACGCGTAAAGAACTTGCCTCTTAGACCGGGACAACATCTGGAAACGGATGCTAATACCGGATATTATGGTTTTTTCGCATGGAGGAATCATGAAAGCTAGATGCGCTAAGAGAGAGCTTTGCGTCCCATTAGCTAGTTGGTGAGGTAACGGCCCACCAAGGCAATGATGGGTAGCCGGCCTGAGAAGGTGAACGGCCACAAGGGGACT
2 CAGGCTTAACACATGCAAGTCGTGGGGCAGCGGATGCTTAGCTCGCTAAGTATGCCGGCGACCGGCGCACGGGTGAGTAACGCGTACCGAACCTGCCCATCACACAGGGATAGGCTTGCGAAAGCAAGATTAATACCTGATGGTCTCAGTTGTATGCATGTATAATTGAGTAAAGCCTTCGGGCGGTGATGGATGGCGGTGCGTCCCATTAGGAAGTTGGCGGGGTAACGGCCCACCAAGGCGACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACT
@nhoffman I think is ready for review/approval now
This MR adds a new Rscript which mimics the logic in the old NGS16S pipeline to output a csv of shape:
Where the listed seqs represent svs dropped during the 'chimera check' process.
params-ngs16s.json
test data set runs through to completion, but does not appear to have any svs dropped as chimeras to serve as verification.TODO: extend test set to include sample known to include chimeras