Open handibles opened 4 years ago
Hi @handibles,
Thanks for your interest in mOTUs!
If I understand correctly, you are suggesting how to improve the tool/wiki.
These headers make up the bottom 99% of the BAM/SAM file
This depends on how many reads there are in the fasta file. With few reads, then yes the majority is header.
but could this header output be set as a flag during the pipeline?
It would be possible to do it, but this is a standard format (BAM or SAM). Printing only the reads would make the file invalid. It's more clean to save a correct SAM/BAM file and then visualise it with samtools (as you showed in your example).
Or perhaps in the wiki, outline the general format & composition of a BAM file w.r.t to the work mOTUs is doing? Could help the next person.
Sure, I added a comment at the end of this wiki page: https://github.com/motu-tool/mOTUs_v2/wiki/Profile-one-sample
Ha, nice!
It would be possible to do it, but this is a standard format (BAM or SAM). Printing only the reads would make the file invalid.
Agreed, the last thing we (I) need is invalid files being output into the workflow. However, sometimes it is useful to be able to (e.g.) easily export all the reads that weren't assigned to a separate fastq
etc., for later consideration etc.
Anyone who is interested in scrutinising their reads further could try web-searching for samtools or e.g. see the samtool & SAM format guide for info on how to interact with investigating where reads ended up.
As you say - suggestions. Thanks again for all the diligence.
Apologies for the dread return.
Realised after some more digging that those commands don't return the outputs I expected/described. To clarify:
samtools view -f 4 test1.sam
should return all reads not aligned from a given BAM/SAM file - however the mOTUs BAM/SAM file output through BWA only reports reads that successfully aligned to references. I know that I could get the unaligned by subtracting the aligned from the total reads, but is there somewhere else in the output that I could check?
samtools view -F 4 test1.sam
(note the capital 'F') does return all read alignments as expected, but as it shows multiple mOTU matches for each read, I'm not sure how to determine which of these matches were ultimately selected for mOTUs' final count output.
My end goal is to determine what reference each successfully aligned read was ultimately assigned to - is there a different portion of the output that I should be looking at?
H
So, we parse the result from bwa
. Here is a brief description of what happen in motus map_tax
(it's the first step, i.e. mapping and filtering the reads) [check bin/runBWA.py
]:
@read1
, then the read is changed to something like read1.lane0.single
These reads are then used to understand to which marker gene cluster they belong with motus calc_mgc
(second step, check bin/map_genes_to_mOTUs.py
). Note that here we select the best read from the assignments done by bwa
.
Moreover, if there are paired-end data, it will select the pair with the best score. Hence, not always the read with the best score is selected. If, for example, read1_for
map to MGC1 with score 75 amd MGC2 with score 65; and read2_rev
map to MGC1 with score 73 and to MGC2 with score 75. Then, we will select for read1 (both for and rev) to be mapping to MGC1 (score 75+73) and not MGC2 (score 65+75). But, if you look at the reads alone, then you would chose read1_rev
to MGC2.
Also, we choose all reads that map correctly, and then we distribute proportionally the multiple mappers. For example, if read1
map with score 75 to MGC1 and MGC2, then we cannot decide to who to assign it just by looking at the SAM file.
It is not so easy to understand exactly what happen of the reads, because at certain point we abstract from the single reads and we have just read counts for MGCs, and do some counts/split on that.
That said, all the reads produce by motus map_tax
or motus profile -I
are used in the profiler. In 90% of the cases the best match is the one used (unless there are multiple mappers/forward and reverse decisions).
If you try to run:
motus map_tax -s test1.fq | samtools view
where test1.fq.gz
is attached:
There are 3 reads (@a
, @b
and @unmapped_read
), where @unmapped_read
does not pass the filter. The result of the previous call is:
a.lane0.single 0 metaMG0002471.COG0012 197 3 177M * 0 0 GAAGCAGCTAATTATCCTTTTGCGACAATTGATCCAAATGTCGGGCGGGTAGAAGTTCCGGATGCTCGTTTGGATAAATTAACAGAACTCATCAAACCACAGAAGAAAGTCCCAACAACATTCGAATTTACTGACATTGCTGGAATTGTGAAAGGTGCTTCAAAAGGAGAAGGACTA hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 MD:Z:177 AS:i:177 XS:i:172
a.lane0.single 256 metaMG0002483.COG0012 197 0 177M * 0 0 * * NM:i:1 MD:Z:14C162 AS:i:172
a.lane0.single 272 metaMG0002468.COG0012 959 0 177M * 0 0 * * NM:i:1 MD:Z:162G14 AS:i:172
a.lane0.single 256 refMG0036809.COG0012 182 0 177M * 0 0 * * NM:i:1 MD:Z:14C162 AS:i:172
a.lane0.single 272 metaMG0002487.COG0012 959 0 177M * 0 0 * * NM:i:1 MD:Z:162G14 AS:i:172
a.lane0.single 256 refMG0030819.COG0012 182 0 177M * 0 0 * * NM:i:4 MD:Z:107G2T8C5G51 AS:i:157
b.lane0.single 0 metaMG0002471.COG0012 817 0 158M * 0 0 AGCTACAGAAAATGCTGAAGTTGTTATCATTTCAGCGCGTGCAGAGGAAGAAATTTCTGAATTGGACGATGAAGATAAGTCAGAATTTCTGGAAGCTATCGGCTTGACAGAATCAGGTGTGGATAAACTGACCAGAGCAGCTTATCATCTATTGGGAC hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh NM:i:0 MD:Z:158 AS:i:158 XS:i:158
b.lane0.single 272 metaMG0002468.COG0012 358 0 158M * 0 0 * * NM:i:0 MD:Z:158 AS:i:158
b.lane0.single 256 refMG0036809.COG0012 802 0 158M * 0 0 * * NM:i:0 MD:Z:158 AS:i:158
b.lane0.single 272 metaMG0002479.COG0012 358 0 158M * 0 0 * * NM:i:0 MD:Z:158 AS:i:158
b.lane0.single 256 refMG0022381.COG0012 802 0 158M * 0 0 * * NM:i:1 MD:Z:150G7 AS:i:153
b.lane0.single 256 metaMG0002483.COG0012 817 0 158M * 0 0 * * NM:i:1 MD:Z:150G7 AS:i:153
b.lane0.single 272 metaMG0002477.COG0012 358 0 158M * 0 0 * * NM:i:1 MD:Z:122A35 AS:i:153
b.lane0.single 272 metaMG0002487.COG0012 358 0 158M * 0 0 * * NM:i:2 MD:Z:106C25C25 AS:i:148
b.lane0.single 272 metaMG0002492.COG0012 358 0 158M * 0 0 * * NM:i:3 MD:Z:106C23A1C25 AS:i:143
I would like to make you notice that:
@unmapped_read
is not in the file because it doesn't pass the filter@a
map uniquely, in fact it has 177M
(check CIGAR strings) and NM:i:0
(edit distance to the reference). All other reads have a higher value for NM
(four with NM:i:1
and one with NM:i:4
)@b
map to four genes equally well (158M
and NM:i:0
), so it's not sure to which it will be assigned. Maybe it's a multiple mapper, but It can even be that all genes are in the same MGC, and then it's not ambiguous.Hi @AlessioMilanese ,
Sorry to comment in an old post, but I don't see the need to ask this separately. I am inspecting some cases in which I have many reads mapping to genes from a single MGC, but then such MGC is not counted (it is not present in the output of calc_mgc). My only guess so far is that it has something to do with multiple mappers, because among those mappings there are good quality reads, and good enough alignments (both length and identity).
If I understand correctly from some of your answers (in this one and other issues) multiple mappers are counted, distributing the counts among the different genes/MGCs. However, looking at the code, it seems that there are cases where multiple mappers are completely ignored.
map_genes_to_mOTUs.py, line 956 "#if non of the hit references have a unique mapper and more this insert should be shared by more than 3 references --> ignore it"
However, from the code and the comments alone I cannot fully understand what is happening there.
So, my question is: are there cases where multiple mapper reads are discarded? If yes, which are those cases?
Thank you.
Best, Carlos
Hi Carlos,
can you send me one or two of these reads?
Best, alessio
But basically, If there are more than 3 multiple mappings and there are no unique mapper, then the multiple mappers are ignored.
The idea is that if you have a read that map equally well to E. coli, E. albertii and E. fergusonii, but you don't have any unique mapper to any of these species; then it's highly probable that it is an error.
And even if you would like to count it, you would have to split the read in three and count 1/3 for each species. So you would have really low read count for these 3 species which would be removed in further analyses.
In case you have many reads that are all more than 3 multiple mappers, but there is no single mapper for that species; then something fishy is happening.
Hi Carlos,
can you send me one or two of these reads?
Best, alessio
I uploaded all the reads (actually, the SAM mappings; 74022) mapping to the genes (17) from a single MGC (ref_mOTU_v3_03673) which is not appearing after calc_mgc.
But basically, If there are more than 3 multiple mappings and there are no unique mapper, then the multiple mappers are ignored.
The idea is that if you have a read that map equally well to E. coli, E. albertii and E. fergusonii, but you don't have any unique mapper to any of these species; then it's highly probable that it is an error.
And even if you would like to count it, you would have to split the read in three and count 1/3 for each species. So you would have really low read count for these 3 species which would be removed in further analyses.
In case you have many reads that are all more than 3 multiple mappers, but there is no single mapper for that species; then something fishy is happening.
Thank you very much for the explanation.
EDIT: ignore the next sentences below. I checked the same read with all the mappings (not only the uploaded ones to the single MGC), and it seems to be mapping to genes from other MGCs, so I guess it is a multiple mapper also.
However, after inspecting the mappings I uploaded I am not completely sure this is the case.
For example, the read "ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/1" seems to be mapping uniquely to "refMG0012003.COG0172". It is true that its pair is mapping with flag 272 to the same gene, so I am not sure if this is enough to classify it as multiple mapper...
If you could give me a clue of what is happening here it would be great.
Thank you @AlessioMilanese .
Best, Carlos
Hi again,
From the file I uploaded before, I extracted the list of reads, and from this list I obtained all the mappings from those reads (i.e. those mapping to genes from any MGC).
reads_example_motus_no_mgc.all_mappings.sam.gz
I counted how many times I see each read, and none of them seem to be found only once (ranging from 3 to 54 mappings). So yes, this could be the case where no unique mapping is found. Now I have to wonder why is this happening xD
Best, Carlos
I'm wondering if the problem is another one. I create a small fastq file (paired end) with the read you mentioned (ST-K00119:198:HHCNHBBXY:7:1101:13017:33897
). And I run mOTUs with:
motus profile -f test1.fq -r test2.fq -I test.bam -M test.MGC -o test.motus
where I save the genes (-I
), the MGCs (-M
) and the mOTUs (-o
).
I see then that it is counted as a multiple mapper (from the motus log):
Total number of inserts: 1
- Unique mappers: 0
- Multiple mappers: 1
- Ignored multiple mapper without unique hit: 0
Now if we look at what genes it maps to (samtools view test.bam
):
read1.2.lane0/1 0 refMG0012003.COG0172 451 0 150M * 0 0 AACCGAAAACGTTGAAGTCCGCCGCTGGGGCACACCTCGCGAATTTGATTTTGAAATCAAGGATCACGTCGATGTCGGCGGTCCCTTAGGCCTCGACTTTGACGTCGGCGCTAAGCTTGCCGGAACACGCTTCTGCTTCATGAAAGGCCA AAFFFJJJJFJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJAFJJJAJJJJAJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJ-<AFJJJFJJJJJJJJJJJJJJJAFJF NM:i:0 MD:Z:150 AS:i:150 XS:i:150
read1.2.lane0/1 256 metaMG0004572.COG0172 451 0 150M * 0 0 * * NM:i:0 MD:Z:150 AS:i:150
read1.2.lane0/1 256 metaMG0004633.COG0172 451 0 150M * 0 0 * * NM:i:2 MD:Z:84G2G62 AS:i:140
read1.2.lane0/1 272 metaMG0004593.COG0172 903 0 150M * 0 0 * * NM:i:0 MD:Z:150 AS:i:150
read1.2.lane0/2 0 metaMG0004633.COG0172 683 0 149M * 0 0 CCGTACATTGCTAATCAGGAAACCCTGATCGGTACCGGCCAGCTGCCTAAATTCGAAGAAGACTTGTTTGCAGCTAAGAAGGGCGGCCAGGAAGGCGAAGGCGAAATCTTCTACCTCATCCCGACTTCTGAAGTTACGCTCACAAACTC JFJJJFJJFAAFF<FJFF<AF<FFJJA-7JJJJJJFJ<<FAA-JJJJJJJJJFJJJJJJJJJJJJJJFJJJJJJJJJJJJ<JJJJJJJJJJJJJJJFJJJJJJFFJAJ7FJFJFJJJJFJJJJFJFJFFJJJJJJAJAFJJJJJFFA-A NM:i:0 MD:Z:149 AS:i:149 XS:i:149
read1.2.lane0/2 256 GUT_GENOME022162.COG0172 683 0 149M * 0 0 * * NM:i:2 MD:Z:44T62T41 AS:i:139
read1.2.lane0/2 256 GUT_GENOME025181.COG0172 683 0 149M * 0 0 * * NM:i:3 MD:Z:44T62T14A26 AS:i:134
read1.2.lane0/2 256 GUT_GENOME246333.COG0172 683 0 149M * 0 0 * * NM:i:3 MD:Z:44T62T14A26 AS:i:134
read1.2.lane0/2 256 metaMG0004572.COG0172 683 0 149M * 0 0 * * NM:i:0 MD:Z:149 AS:i:149
read1.2.lane0/2 256 refMG0012003.COG0172 683 0 149M * 0 0 * * NM:i:0 MD:Z:149 AS:i:149
read1.2.lane0/2 272 metaMG0004577.COG0172 672 0 149M * 0 0 * * NM:i:2 MD:Z:41A62A44 AS:i:139
read1.2.lane0/2 272 metaMG0004593.COG0172 672 0 149M * 0 0 * * NM:i:0 MD:Z:149 AS:i:149
If I check test.MGC
I see:
COG0172.mOTU.v2.0002266 0.5000000000
COG0172.mOTU.v2b.0000252 0.5000000000
It reports the MGC because even if there are not unique mappers (I put only one read), it maps to only 2 MGCs (if it would be more than 3 then it would not be reported).
But then there are no mOTUs reported:
Warning: the relative abundance is 0 for all the mOTUs
Number of ref-mOTUs: 0
Number of meta-mOTUs: 0
Number of ext-mOTUs: 0
Because there is only one MGC detected. It needs at least 3 MGCs from one mOTUs (default -g 3
) to report a mOTUs.
So changing the command to -g 1
would make this mOTU appear. Call:
motus profile -f test1.fq -r test2.fq -g 1 -o test.motu
And we can check with cat test.motu | grep -v "0.0000000000"
:
Acinetobacter sp. [ref_mOTU_v3_03673] 0.5000000000
Burkholderiales species incertae sedis [meta_mOTU_v3_12785] 0.5000000000
Hi @AlessioMilanese ,
Thank you very much for testing this.
When I run it with all the reads it is not reported neither with -g 1 or -g 3 (obviously, since no MGC is reported after all).
So I guess (let me know if I am wrong), that what you suggest is that the problem is this one "It reports the MGC because even if there are not unique mappers (I put only one read), it maps to only 2 MGCs (if it would be more than 3 then it would not be reported)." So the problem is that I have multiple mapping reads to genes without unique mappers, and that those genes belong to more than 2 MGCs.
If this is the case, I guess I am just losing this information, because these reads are ambiguous (map to conserved regions of the genes?), and they are impossible to assign with certain confidence. I know this is difficult to address (maybe impossible), but in this case it is not only one or more Species which are not detected, but an Order which is not present when these mappings are discarded.
I wonder whether these mappings should/could be assigned to the Order (assuming that all the mapped genes belong to such Order), even when no Species is assigned.
Thank you very much once more, for your time and patience.
Best, Carlos
I checked whether the reads map to genes only from a single order. To do it fine I guess I should keep only the best mappings (discard those with lower identity, etc), but to make things easier I included all the hits. I obtained 2 orders though (Burkholderiales and Pseudomonadales, both from class Betaproteobacteria if I not mistaken).
I was wondering also now... if I had a sample with less sequencing depth, then in some cases (like this example we are discussing) I would detect things just because there would be no enough depth (just by chance) to have reads mapping to more than 2 MGCs?
edit: sorry, I am thinking now the last paragraph makes no sense, because we are considering individual reads here edit2: sorry again jaja but if they were individual reads, then why when you use a single read the MGC counts are reported, but when I use all the reads they are not. Then it must be that all reads together are considered for the MGC counts?
I checked my mappings for the same read, and I have the same results as you, so I wonder why I don't get any MGC count nor detect mOTU ref_mOTU_v3_03673 (meta_mOTU_v3_12785 is detected though):
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/1 0 refMG0012003.COG0172 451 0 150M * 0 0 AACCGAAAACGTTGAAGTCCGCCGCTGGGGCACACCTCGCGAATTTGATTTTGAAATCAAGGATCACGTCGATGTCGGCGGTCCCTTAGGCCTCGACTTTGACGTCGGCGCTAAGCTTGCCGGAACACGCTTCTGCTTCATGAAAGGCCA AAFFFJJJJFJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJAFJJJAJJJJAJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJ-<AFJJJFJJJJJJJJJJJJJJJAFJF NM:i:0 MD:Z:150 AS:i:150 XS:i:150
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/1 256 metaMG0004572.COG0172 451 0 150M * 0 0 * * NM:i:0 MD:Z:150 AS:i:150
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/1 256 metaMG0004633.COG0172 451 0 150M * 0 0 * * NM:i:2 MD:Z:84G2G62 AS:i:140
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/1 272 metaMG0004593.COG0172 903 0 150M * 0 0 * * NM:i:0 MD:Z:150 AS:i:150
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/2 16 metaMG0004572.COG0172 683 0 149M * 0 0 CCGTACATTGCTAATCAGGAAACCCTGATCGGTACCGGCCAGCTGCCTAAATTCGAAGAAGACTTGTTTGCAGCTAAGAAGGGCGGCCAGGAAGGCGAAGGCGAAATCTTCTACCTCATCCCGACTTCTGAAGTTACGCTCACAAACTC JFJJJFJJFAAFF<FJFF<AF<FFJJA-7JJJJJJFJ<<FAA-JJJJJJJJJFJJJJJJJJJJJJJJFJJJJJJJJJJJJ<JJJJJJJJJJJJJJJFJJJJJJFFJAJ7FJFJFJJJJFJJJJFJFJFFJJJJJJAJAFJJJJJFFA-A NM:i:0 MD:Z:149 AS:i:149 XS:i:149
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/2 256 metaMG0004577.COG0172 672 0 149M * 0 0 * * NM:i:2 MD:Z:41A62A44 AS:i:139
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/2 256 metaMG0004593.COG0172 672 0 149M * 0 0 * * NM:i:0 MD:Z:149 AS:i:149
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/2 272 GUT_GENOME022162.COG0172 683 0 149M * 0 0 * * NM:i:2 MD:Z:44T62T41 AS:i:139
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/2 272 GUT_GENOME025181.COG0172 683 0 149M * 0 0 * * NM:i:3 MD:Z:44T62T14A26 AS:i:134
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/2 272 GUT_GENOME246333.COG0172 683 0 149M * 0 0 * * NM:i:3 MD:Z:44T62T14A26 AS:i:134
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/2 272 metaMG0004633.COG0172 683 0 149M * 0 0 * * NM:i:0 MD:Z:149 AS:i:149
ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0/2 272 refMG0012003.COG0172 683 0 149M * 0 0 * * NM:i:0 MD:Z:149 AS:i:149
If I understand correctly, the MGC table is empty?
So if you run motus profile -I test.MGC [...]
, then test.MGC
is empty?
It seems there is a problem in the way the code is running. Can you try to run motus profile --test
; the result should be:
------------------------------------------------------------------------------
| TEST mOTU profiler |
------------------------------------------------------------------------------
1-- run setup.py: done
2-- Tools and versions:
- python: correct
- bwa: correct
- samtools: correct
- metaSNV: correct
3-- Taxonomy profiling test:
- Run motus (-v 1, only error messages):
- end motus call
Check resulting profile: correct
Hi @AlessioMilanese ,
No, the MGC file is not empty. It is just missing the results that you showed for the single read you tested.
The result of motus profile --test
is exactly as the one you showed.
Actually, I ran again motus calc_mgc -i SAM -o COUNTS
using as SAM the file I uploaded above (which includes all mappings of the reads which map to genes from mOTU ref_mOTU_v3_03673), and these are the results:
# map_tax unknown | gene database: unknown | 100 | calc_mgc 3.0.1 -y insert.scaled_counts -l 75
unnamed sample
COG0495.mOTU.v2.0001725 2496.6341950236147
COG0215.mOTU.v2.0001782 2818.302496956346
COG0016.mOTU.v2.0001928 2486.4341914358342
COG0533.mOTU.v2.0001812 1470.2779572060317
COG0012.mOTU.v2.0002033 2134.7448621374724
COG0018.mOTU.v2.0001793 5478.124453140666
COG0541.mOTU.v2.0002154 2306.6405502276
COG0018.ext_mOTU_v3_003165 678.7739440996514
COG0172.mOTU.v2.0002266 1273.9701453371777
COG0552.mOTU.v2.0001822 1205.097204435369
If I did everything right, none of those MGCs are from mOTU ref_mOTU_v3_03673, which includes the next MGCs:
COG0012.mOTU.v2b.0000253
COG0016.mOTU.v2b.0000239
COG0018.mOTU.v2b.0000243
COG0172.mOTU.v2b.0000252
COG0215.mOTU.v2b.0003523
COG0495.mOTU.v2b.0000246
COG0533.mOTU.v2b.0000247
COG0541.mOTU.v2b.0000300
COG0552.mOTU.v2b.0000348
When I run, as you did, only the mappings from the single read above (ST-K00119:198:HHCNHBBXY:7:1101:13017:33897.lane0), I get the next results
# map_tax unknown | gene database: unknown | 100 | calc_mgc 3.0.1 -y insert.scaled_counts -l 75
unnamed sample
COG0172.mOTU.v2.0002266 0.5
COG0172.mOTU.v2b.0000252 0.5
COG0172.mOTU.v2b.0000252 belongs to mOTU ref_mOTU_v3_03673. Why this result is only present when I use a single read, and not when the same read mappings are along with mappings from other reads?
Maybe I am doing some error which is leading to this confusion, I don't know.
Thank you once more.
Best, Carlos
Hey devs,
Thanks for all the work on this. I'm checking the fate of the
fastq
reads moving through mOTUs (i.e., which specific reads are assigned, which are not etc.), so interested in the headers from the fastq input.These headers make up the bottom 99% of the BAM/SAM file and can be obtained as below - but could this header output be set as a flag during the pipeline? Or perhaps in the wiki, outline the general format & composition of a
BAM
file w.r.t to the work mOTUs is doing? Could help the next person.