nf-core / kmermaid

k-mer similarity analysis pipeline
https://nf-co.re/kmermaid
MIT License
19 stars 12 forks source link

Output all informational BAM tags to fastq #122

Closed olgabot closed 3 years ago

olgabot commented 3 years ago

The current implementation of the bam2fasta step does not retain information about whether the read was aligned or not in the read ID for the fasta. This can be found by looking at the aligned/unaligned fastas separately, but I'd like to have that information entirely in the read name and not need to look anywhere else.

What are all the supported flags?

Here are links to all suppported flags and some of the important ones

Aligned, but no gene assigned

``` Tue 1 Dec - 10:25  ~/data_sm/immune-evolution/pipeline-results/mouse/kmermaid/lung--mouse--remove-ribo/10x-bams  olga@lrrr  samtools view MACA_18m_M_LUNG_52__possorted_genome_bam.bam | head A00111:77:H3YKNDMXX:1:2115:20862:34710 272 chr1 3014879 0 90M * 0 0 GGGCTTATAAAGTTTGCAAGTCTAATGGGCCTCTATTTGCTGTGATGGCTGAGTAGGCCATCTGTGGATACATTGGCTGCTAGTGACAAG FF----F8FF--FF8-F--FF-------F----F-FFF-F--F-----FF8F-F-F-F-FF---8-FFFF--F---F-F--F--F-F--F NH:i:5 HI:i:3 AS:i:68 nM:i:10 RE:A:I BC:Z:GCAGTAGA QT:Z:F8FFFF8F CR:Z:TTAGGACCACGAAATA CY:Z:8-8-88F8FFFFFFFF CB:Z:TTAGGACCACGAAATA-1 UR:Z:GGGCTCCACA UY:Z:--F-FFFFFF UB:Z:GGGCTCCACA RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:1:1405:5104:9893 272 chr1 3014893 0 90M * 0 0 TGCAAGTCCAATGGGCCTCTATTTGCAGTGATGGCCGACTAGGCCATCTTTTGATACATATGCAGCAAGAGACAAGAGCTTCGGGGTACT FFF-FFFF8--FFFFFFFFF-F8--F-F8FF-FFFFFFF-8F-F88-F-FFFF-F-F-F-FFF-FF--FFFFF--FFFFFFFFFFFFFFF NH:i:10 HI:i:8 AS:i:84 nM:i:2 RE:A:I BC:Z:CAGTACTG QT:Z:F8FFFFFF CR:Z:ACGTCAACAGTAAGAT CY:Z:8F88888F88-FFFFF CB:Z:ACGTCAACAGTAAGAT-1 UR:Z:CGGACACGGT UY:Z:FF-FFFFFFF UB:Z:CGGACACGGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:1:2323:22173:34882 272 chr1 3014912 0 88M2S * 0 0 TATTTGCAGTGATGGCCGACTAGGCCATCTTTTGTTAGATTTGCAGCTAGAGACAAGAGCTCCGGGGTACTGGTTTGTTCATATTGTTGC F-FF-8FFF-F-8-FF--F8F-FF---F-FFF8F-FF-FF-F---FF-FF-F----FFFF---FFFFFF-FFFFF-F-FFFFFF--F--- NH:i:6 HI:i:3 AS:i:76 nM:i:5 RE:A:I BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:TGGGCGTAGATCATGG CY:Z:888F888FFFFFFFFF CB:Z:TGGGCGTAGATCACGG-1 UR:Z:GACGGTGCGG UY:Z:FFFFFFFFFF UB:Z:GACGGTGCGG RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:2:1280:1832:14591 272 chr1 3015028 0 90M * 0 0 TAGCTCCTTGGGTAATTTCTCTAGCTCCTCCATTGGGGGCCGTGTGACCCATCCAATAGCTGACTGTGATCATCCACTTATGTGTTTGCT F-FFFFFFFFFFFF--FFFFF--FFFFF-FF-FFFFFFFFFFFFFFFFFF-FFF--FFFFFFFFFFFFFFFFFFFFFFF-FFFFFFFFFF NH:i:7 HI:i:4 AS:i:86 nM:i:1 RE:A:I BC:Z:TTCCCGAC QT:Z:FFFFFFFF CR:Z:CTCGTCATCTGACCTC CY:Z:88888F88FFFFFF8F CB:Z:CTCGTCATCTGACCTC-1 UR:Z:GAATAGCAGC UY:Z:FFF-FFFFFF UB:Z:GAATAGCAGC RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:1:1430:23863:10379 272 chr1 3016933 0 90M * 0 0 ATGTATTTTATATTATTTGTGACTATTGAGAAGGGTGTTGTTTCCCTAATTTCTTTCTCAGCCTGTTTATCCTTTGTGTACAGAAAGGCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6 HI:i:5 AS:i:88 nM:i:0 RE:A:I BC:Z:TTCCCGAC QT:Z:FFFFFFFF CR:Z:CGTTAGAAGACGCTTT CY:Z:F8F88FFFF8FFFFFF CB:Z:CGTTAGAAGACGCTTT-1 UR:Z:AACTGTTTCG UY:Z:FFFFFFFFFF UB:Z:AACTGTTTCG RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:1:1227:6424:20165 272 chr1 3018511 3 90M * 0 0 TTATTTATGTTGTTATTGAAGATCAGCCTTAGTCCATGGTGATCTGATAGGATGCATGGGACAATTTCAATATTTTTGTATATGTTGACG FFF-FFFFFFFFFFFFFFFFFFFFF-FFFFFFFFFFFFFFFFFFFFFFFFFFF-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:2 HI:i:2 AS:i:88 nM:i:0 RE:A:I BC:Z:CAGTACTG QT:Z:8FFFFFFF CR:Z:CATATGGGTTGTTTGG CY:Z:8F88-F88FF-FFF-F CB:Z:CATATGGGTTGTTTGG-1 UR:Z:CTTTATGTGT UY:Z:-FFFFFFFFF UB:Z:CTTTATGTGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:1:2214:12319:26866 0 chr1 3025870 255 90M * 0 0 CTGGTAATAAAGACACATGCCCTATTATGTTCATAGCAGCCTTATTTATAAAAGCCAGAAGCTGGAAAGAACCCAGATGCCCCTCAACAG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:88 nM:i:0 RE:A:I BC:Z:TTCCCGAC QT:Z:FFFFFFFF CR:Z:CCTTGAGTTTCATGTG CY:Z:FF8888FFFFFFFFFF UR:Z:TTTTGCAAAT UY:Z:FFFFFFFFFF UB:Z:TTTTGCAAAT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:1:2206:23827:3834 256 chr1 3026959 1 41M302444N49M * 0 0 TAACTAGTGTCGCAACAATAAAATTTGAGCTTTGATCAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF-FFFFFFFFFFFFFFF-F8FFFFF-FFFFFFFFFFFFFFFF8F8F-FFFF NH:i:3 HI:i:3 AS:i:77 nM:i:0 RE:A:I BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:CGATGGCGTCAACATC CY:Z:FF888F8FFFFFFFFF CB:Z:CGATGGCGTCAACATC-1 UR:Z:CGCCCGTTGC UY:Z:FFFFFFFFFF UB:Z:CGCCCGTTGC RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:1:2363:25120:29496 256 chr1 3026959 1 41M302444N49M * 0 0 TAACTAGTGTCGCAACAATAAAATTTGAGCTTTGATCAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF-FFFFFFF88FF8FFFFFFFFFFF8F-FFFFFFFF-F NH:i:3 HI:i:3 AS:i:77 nM:i:0 RE:A:I BC:Z:TTCCCGAC QT:Z:FFFFFFFF CR:Z:TAGTGGTCAGCGTTCG CY:Z:888F88F8FFFFFFFF CB:Z:TAGTGGTCAGCGTTCG-1 UR:Z:CGTTCATGCC UY:Z:FFFFFFFFFF UB:Z:CGTTCATGCC RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:2:1380:6903:5149 16 chr1 3027849 255 90M * 0 0 ACATGTATTTCTCTCATTTTTACAACACAGTTTTGTTATTGACACTTCACTCTAACATCAGAAGTGATTGCAAGAAAAAAGTTGTTTTTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:88 nM:i:0 RE:A:I BC:Z:TTCCCGAC QT:Z:FFFFFFFF CR:Z:CAGTAACGTACGAAAT CY:Z:FFFFFFFFFFFFFFFF CB:Z:CAGTAACGTACGAAAT-1 UR:Z:AAGCGTGGAT UY:Z:FFFFFFFFFF UB:Z:AAGCGTGGAT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 ```

Aligned, with assigned gene

``` Tue 1 Dec - 10:25  ~/data_sm/immune-evolution/pipeline-results/mouse/kmermaid/lung--mouse--remove-ribo/10x-bams  olga@lrrr  samtools view MACA_18m_M_LUNG_52__possorted_genome_bam.bam | rg GN: | head A00111:77:H3YKNDMXX:1:2433:27670:3756 1040 chr1 3365854 255 90M * 0 0 AAAGAAATTGTGATATATTTCTTTGTCACATGATCAGCATAGTAGATGATGTGTCTTCATTTCCTACAAAAAAGAGGAAGCAGTTAAAAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:88 nM:i:0 TX:Z:ENSMUST00000195335.1,+2606,90M GX:Z:ENSMUSG00000103377.1 GN:Z:Gm37180 RE:A:E BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:GCTGCTTTCGATGAGG CY:Z:88888888FFFFFFFF CB:Z:GCTGCTTTCGATGAGG-1 UR:Z:CGTTGATAGT UY:Z:FFFFFFFFFF UB:Z:CGTTGATAGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:2:2488:25409:10833 1040 chr1 3365875 255 90M * 0 0 TTTGTCACATGATCAGCATAGTAGATGATGTGTCTTCATTTCCTACAAAAAAGAGGAAGCAGTTAAAATTGTGTGTGTGTGGTTCTGGAT FFFFF8FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF--FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:88 nM:i:0 TX:Z:ENSMUST00000195335.1,+2585,90M GX:Z:ENSMUSG00000103377.1 GN:Z:Gm37180 RE:A:E BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:ATCATGGAGTGGTAGC CY:Z:8F888FFFFFFFFFFF CB:Z:ATCATGGAGTGGTAGC-1 UR:Z:CCCTTTATGT UY:Z:FFFFFFFFFF UB:Z:CCCTTTATGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:1:2424:27543:35164 16 chr1 3365890 255 90M * 0 0 GCATAGTAGATGATGTGTCTTCATTTCCTACAAAAAAGAGGAGGCAGTTAAAATTGTGTGTGTGTGGTTCTGGATTAAATATTATTAATC FF-FFFFFFFFFFFFFFFFFFFFF8FFFFFFFFFFFFFFFFFFFFF8-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:86 nM:i:1 TX:Z:ENSMUST00000195335.1,+2570,90M GX:Z:ENSMUSG00000103377.1 GN:Z:Gm37180 RE:A:E BC:Z:AGTAGTCT QT:Z:FFFFFFFF CR:Z:ATCATGGAGTGGTAGC CY:Z:8F88888FFFFFFFFF CB:Z:ATCATGGAGTGGTAGC-1 UR:Z:CCCTTTATGT UY:Z:FFFFFFFFFF UB:Z:CCCTTTATGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:2:2137:20609:29387 1040 chr1 3365906 255 79M1I10M * 0 0 GTCTTCATTTCCTACAAAAAAGAGGAAGCAGTTAAAATTGTGTGTGTGTGGTTCTGGATTAAATATTATTAATCAAAAAAGGGGGCTGTC FFFFFFFF8FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:83 nM:i:0 TX:Z:ENSMUST00000195335.1,+2555,10M1I79M GX:Z:ENSMUSG00000103377.1 GN:Z:Gm37180 RE:A:E BC:Z:CAGTACTG QT:Z:FFFFFFFF CR:Z:ATCATGGAGTGGTAGC CY:Z:8F88FF8FFFFFFFFF CB:Z:ATCATGGAGTGGTAGC-1 UR:Z:CCCTTTATGT UY:Z:FFFFFFFFFF UB:Z:CCCTTTATGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:1:1252:23511:15781 1040 chr1 3365915 255 90M * 0 0 TCCTACAAAAAAGAGGAAGCAGTTAAAATTGTGTGTGTGTGGTTCTGGATTAAATATTATTAATCAAAAAGGGGGCTGTCAGTAGGATGA FFF8FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:88 nM:i:0 TX:Z:ENSMUST00000195335.1,+2545,90M GX:Z:ENSMUSG00000103377.1 GN:Z:Gm37180 RE:A:E BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:GCTGCTTTCGATGAGG CY:Z:FF8FF8FFFFFFFFFF CB:Z:GCTGCTTTCGATGAGG-1 UR:Z:CGTTGATAGT UY:Z:FFFFFFFFFF UB:Z:CGTTGATAGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:2:1378:22209:19664 1040 chr1 3365950 255 35M1I54M * 0 0 TGTGTGGTTCTGGATTAAATATTATTAATCAAAAAAGGGGGCTGTCAGTAGGATGATATAAGATATAGATGTAGTTTATCTCCTAATCCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:83 nM:i:0 TX:Z:ENSMUST00000195335.1,+2511,54M1I35M GX:Z:ENSMUSG00000103377.1 GN:Z:Gm37180 RE:A:E BC:Z:CAGTACTG QT:Z:FFFFFFFF CR:Z:ATCATGGAGTGGTAGC CY:Z:8F88FFFFFFFFFFFF CB:Z:ATCATGGAGTGGTAGC-1 UR:Z:CCCTTTATGT UY:Z:FFFFFFFFFF UB:Z:CCCTTTATGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:1:1214:26558:27007 16 chr1 3365959 255 90M * 0 0 CTGGATTAAATATTATTAATCAAAAAGGGGGCTGTCAGTAGGATGATATAAGATATAGATGTAGTTTATCTCCTAATCCCACCCTTCCTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:88 nM:i:0 TX:Z:ENSMUST00000195335.1,+2501,90M GX:Z:ENSMUSG00000103377.1 GN:Z:Gm37180 RE:A:E BC:Z:TTCCCGAC QT:Z:F8FFFFFF CR:Z:GCTGCTTTCGATGAGG CY:Z:8FF8F8FFFF8FFFFF CB:Z:GCTGCTTTCGATGAGG-1 UR:Z:CGTTGATAGT UY:Z:FFFFFFFFFF UB:Z:CGTTGATAGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:2:1169:24306:30655 1040 chr1 3365985 255 90M * 0 0 GGGGGCTGTCAGTAGGATGATATAAGATATAGATGTAGTTTATCTCCTAATCCCACCCTTCCTCAAAGATTTCTGTCAGTGACATTGTTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:88 nM:i:0 TX:Z:ENSMUST00000195335.1,+2475,90M GX:Z:ENSMUSG00000103377.1 GN:Z:Gm37180 RE:A:E BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:GCTGCTTTCGATGAGG CY:Z:8F88F8FFFFFFFFFF CB:Z:GCTGCTTTCGATGAGG-1 UR:Z:CGTTGATAGT UY:Z:FFFFFFFFFF UB:Z:CGTTGATAGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:1:1339:19226:22232 1040 chr1 3366061 255 90M * 0 0 CAGTGACATTGTTATCAGACTCAAACATGGGGATGATTCTGCCAGTGACTTTAATTACTTTCCCATCAAAGGCCCATTGAGCAGTTTCAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:88 nM:i:0 TX:Z:ENSMUST00000195335.1,+2399,90M GX:Z:ENSMUSG00000103377.1 GN:Z:Gm37180 RE:A:E BC:Z:TTCCCGAC QT:Z:FFFFFFFF CR:Z:GCTGCTTTCGATGAGG CY:Z:FFFFFFFFFFFFFFFF CB:Z:GCTGCTTTCGATGAGG-1 UR:Z:CGTTGATAGT UY:Z:FFFFFFFFFF UB:Z:CGTTGATAGT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:1 A00111:77:H3YKNDMXX:2:2120:12717:35509 1040 chr1 3376586 255 90M * 0 0 TTGTTTTGGTTTTGTGTGGGTGTGGTTTTTTTAAATATATTCTTGTTTTCTTGGGTTTTGTGAAGACAGTTCTCTTGAATTGTGTTTCGG FFFFFF8FFF-FFFFFFF8FFFFFFFF8FFFFFF-FFF--8FFFFFFFFF-FFFFFF-F88FF8F8FFFFFFFF8FFFFFFFFFFFFFF- NH:i:1 HI:i:1 AS:i:88 nM:i:0 TX:Z:ENSMUST00000192336.1,+1113,90M GX:Z:ENSMUSG00000104017.1 GN:Z:Gm37363 RE:A:E BC:Z:AGTAGTCT QT:Z:8FF8F8FF CR:Z:CATTATCGTACTCGCG CY:Z:FF8F8-FF8FFFFF8F CB:Z:CATTATCGTACTCGCG-1 UR:Z:AACTAAGATA UY:Z:FFFFFFFFFF UB:Z:AACTAAGATA RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 ```

Unaligned (thus no gene assigned)

``` Tue 1 Dec - 10:35  ~/data_sm/immune-evolution/pipeline-results/mouse/kmermaid/lung--mouse--remove-ribo/10x-bams  olga@lrrr  tail MACA_18m_M_LUNG_52__CTACATTGTTCGGGCT.sam A00111:77:H3YKNDMXX:2:1184:18973:15248 4 * 0 0 * * 0 0 CTCCTACGGGCCAGGGGGATCCTATCACAAAAGAATAAAGCAGCCTGATTGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:0 HI:i:0 AS:i:57 nM:i:1 uT:A:1 BC:Z:GCAGTAGA QT:Z:FFFF8FFF CR:Z:CTACATTGTTCGGGCT CY:Z:FF8FFFFFFFFFFFFF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:TAACGTTAGC UY:Z:FFFFFFFFFF UB:Z:TAACGTTAGC RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:2:1185:7355:3771 4 * 0 0 * * 0 0 GCAGTGGTATCAACGCAGAGTACATGGGGCTCATCCGGTCTCTTTGGCCTCGCCGGTAGAAGCAAGATGACGAAGGGACCGTCATCCTTT FFFFFFFFF8FFFFFFFFFFFFFF-FFFFFFF-FFFFFFF-F-F-FFFFFFFFFFFF-FFFFFF-FFFFFFFFFFFFF-FFFF-FFFF-F NH:i:0 HI:i:0 AS:i:57 nM:i:2 uT:A:1 BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:CTACATTGTTCGGGCT CY:Z:F88F8F8FFFFFFFFF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:ACTAGATAAC UY:Z:FFFFFFFFFF UB:Z:ACTAGATAAC RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:2:1187:27362:31563 4 * 0 0 * * 0 0 CGGTTAATAAAAAAAAATACCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGAATAAAATTTAAAAAAAAAAAAAAATATA F-F-8--F-F-F-F-FF-FFF---FFFF-FF-FFFF-FF--F-FF-F-FFFF-FF--FF--F----F------F--FF-FFFF---F-FF NH:i:0 HI:i:0 AS:i:53 nM:i:5 uT:A:1 BC:Z:GCAGTAGA QT:Z:FFFFFFF8 CR:Z:CTACATTGTTCGGGCT CY:Z:88F88FFFFFFFFFFF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:CGCAGAATTC UY:Z:FFFFFFFFFF UB:Z:CGCAGAATTC RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:2:1201:28736:20353 4 * 0 0 * * 0 0 GCTCCAGCTCCTGCCCACCCACCCCCAAATACCATAACATACACTTATTAAAATACCCACAATTAGAGCCCTTGCAGAGATTTATAAAAA FFFFFFFFFFFFF-FF-FFF--FFFF-----FF-F-F--F-F---8--F-F----F8F----FF--F-8F--F--F-8--88F---FF-- NH:i:0 HI:i:0 AS:i:37 nM:i:10 uT:A:1 BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:CTACATTGTTCGGGCT CY:Z:888FF8FFFFFFFFFF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:ATCCAGTAGA UY:Z:FFFFFFFFFF UB:Z:ATCCAGTAGA RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:2:1202:12680:3349 4 * 0 0 * * 0 0 GGGGGGGGGGGGGGGGGGGGGGGGTGTGGGTTGGGGAGGTGAGTGGGGGGCTGAGGTGGGGGATGATAAGAAAAGGGAAGGGAATAGGAA F-FF-FFFFFFFF-F----FFF-F-F-F----F-F-------------F------------------F--F--------8--------8- NH:i:0 HI:i:0 AS:i:32 nM:i:3 uT:A:1 BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:CTACATTGTTCGGGCT CY:Z:8F8-F8-8FFF-F-FF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:AGCAGGGAGC UY:Z:FFFFF-FFFF UB:Z:AGCAGGGAGC RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:2:1203:30400:17096 4 * 0 0 * * 0 0 AAGCAGTGGTATCAACGCAGAGTACATGGGGATTTATTTTCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTACTTTCTAAAT FFFFFFFF-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF-F-FFFF-FFFFFFFFFF-FFF-FF8FF8F8-FF-F8F---F NH:i:0 HI:i:0 AS:i:53 nM:i:1 uT:A:1 BC:Z:GCAGTAGA QT:Z:F8FFF8FF CR:Z:CTACATTGTTCGGGCT CY:Z:-F88F--F-FFFF8FF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:TCCCTTACGC UY:Z:FFFFFFFFFF UB:Z:TCCCTTACGC RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:2:1206:15483:21825 4 * 0 0 * * 0 0 CGGTGTTTAAAAAAAAAAAAAAATAAAAAAAAAAAAAATAAACAAAAAAAAAAAAAAAAAAAAAATAATTTAAAAAAAAAACAAAAAAAA F---------F-F--F-FF-FF--F--FFFFFFFF-FF-FFFF-FF-F--F-FFFFFFFFFF8F--FF-F-F--8-FFF---FFFF--8F NH:i:0 HI:i:0 AS:i:50 nM:i:10 uT:A:1 BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:CTACATTGTTCGGGCT CY:Z:8F8FFF8FFFFFFFFF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:ACCTTTAATC UY:Z:FFFFFFFFFF UB:Z:ACCTTTAATC RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:2:1208:4155:18208 4 * 0 0 * * 0 0 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTAGGGGGGAGAAAAGACCAAGGAGAGGGAGAACAGGGCGCAGGCGGAG -FFFFFFFFF-FFFFFFFFF--FF-FF-F-F-F--------------------------------------------------------- NH:i:0 HI:i:0 AS:i:43 nM:i:3 uT:A:1 BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:CTACATTGTTCGGGCT CY:Z:FF888FFFFFFFFFFF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:AAGACACTAT UY:Z:FFFFFFFFFF UB:Z:AAGACACTAT RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:2:1208:27380:25207 4 * 0 0 * * 0 0 AGGGGAAAAAAAAAAAAAAAATGCAAAAAAAAAAAAAAAAATAAAATAAAAAAAAAAAAAAAAAATCAGAAAATAAAAAAAAAAAAATAA --F---F-FF-FF-FFF--F-----FFFFF----FF-F--F-F--F-FFFFFFF---FFFFF-----F-------FFF-FFFF8-F---- NH:i:0 HI:i:0 AS:i:40 nM:i:7 uT:A:1 BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:CTACATTGTTCGGGCT CY:Z:FF8FFFFFFFFFFFFF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:CTCCATACTG UY:Z:FFFFFFFFFF UB:Z:CTCCATACTG RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 A00111:77:H3YKNDMXX:2:1209:7663:1736 4 * 0 0 * * 0 0 GTGGTATCAACGCAGAGTACATGGGATTGAAGAAATTGCAGAAAACTGTAGAAGGATAAGCCGGCCCTTATATAAACATTTTTGTAGGAT -F-F-F-FFFFFFFFFFFFFFFFFFFF--FFFF-F--FFF--FFFFFFFF-FFF-F---F-----F---FFFFFFFF8F-FFFFFF-FFF NH:i:0 HI:i:0 AS:i:57 nM:i:3 uT:A:1 BC:Z:GCAGTAGA QT:Z:FFFFFFFF CR:Z:CTACATTGTTCTGGCT CY:Z:-F--FFF8FFF-FFFF CB:Z:CTACATTGTTCGGGCT-1 UR:Z:CAACCTGCGA UY:Z:FFFFFF-FFF UB:Z:CAACCTGCGA RG:Z:MACA_18m_M_LUNG_52:MissingLibrary:1:H3YKNDMXX:2 ```

Thus, this PR adds at least NH, HI, and RE tags, plus all known tags just in case they're needed for downstream processing.

PR checklist

olgabot commented 3 years ago

Only nf-core linting is failing, merging