Closed skeffington closed 1 year ago
If I understand it correctly, the used reads corresponds to R1 and R2 combined: So it would be 6.9M = 2x 3.45M *15% = 2x 0.86M
Yes, the bam output is the output of the SAM file. But I'm not sure if it contains the unmapped reads.
But keep in mind that there are 2 (or actually 3 ) points of mapping.
reports/assembly_report.html
and stats/combined_contig_stats.tsv
genomes/counts/counts_genomes.parquet
reports/genome_mapping/results.html
atlas run genecatalog
However, if you say there is a 25% mapping rate in (1.), it means the assembly is not very good. Maybe this is only one sample, and you still get good results in others.
But I fear if the assembly is not good then the MAGs nor the genecatalog will be comprehensive.
Can you check the mapping rate of genomes and or genecatalog ?
Can you check what's in a sample?
I suggest to use sendsketch.sh in=reads_R1.fastq.gz in2=reads_R2.fastq,gz
(sandsketch is installe dwith atlas)
If you can not make a good assembly an idea would be to use
PLASS
gene assembler would workThanks for the helpful ideas Silas! I'll have a go in the next few days and post here how I get on.
There was no activity since some time. I hope your issue is solved in the mean time. This issue will automatically close soon if no further activity occurs.
Thank you for your contributions.
I'm trying to investigate unassembled reads in my dataset. I have some highly dominant taxa according to 16S that are completely missing from the metagenomes, so I'm wondering if this is an assembly problem with the metagenome data or a primer bias issue with the 16S data. I'm just struggling to interpret some of the Atlas output files and wondered if you could provide some guidance:
I started by looking here: /logs/assembly/calculate_coverage/align_reads_from_M50a_to_filtered_contigs.log I see
I don't entirely understand this, as it says 0.89M or ~25% of reads are mapped, yet 'reads used' is 6.9M. Is it not 25% of reads used?
The align2.BBWrap command was:
suggesting there should be a sam file output. However, in M50a/sequence_alignment/ I only have a bam output - is this the output of the above command?
I also thought that /genomes/alignments/unmapped might have the unmapped reads from this mapping. Is that the case? The numbers of reads in the fastq files in this directory don't match up with the numbers in the align_reads_from_M50a_to_filtered_contigs.log file however.
Any guidance would be much appreciated!