Closed Overcraft90 closed 2 years ago
You cannot use CLR reads for merqury evaluations, their k-mer distributions are dominated by errors.
Luckily, you can can convert a bam file to fasta or fastq with samtools and awk:
samtools view reads.ccs.bam |awk '{RQ=-1; for (i = NF; i > 11; i--) { if (match($i, "rq:")) RQ=substr($i, 6, length($i)); } if (RQ >= 0.99) { print "@"$1; print $10; print "+"; print $11; }}' > reads.ccs.fastq
or
samtools view reads.ccs.bam |awk '{RQ=-1; for (i = NF; i > 11; i--) { if (match($i, "rq:")) RQ=substr($i, 6, length($i)); } if (RQ >= 0.99) { print ">"$1; print $10; }}' |fold -c > reads.ccs.fasta
This will confirm the read is Q20 (RQ >= 0.99) and dump its sequence.
Hi @skoren,
Thank you so much! I will try and let you know.
Thank you so much,
I have been able to retrieve HiFi reads from my noisy CLR data. Thanks a lot for sharing these two short script! I think we can close the issue.
Hi I have a quick question,
Unfortunately, I'm dealing with some ccs.bam files for which it is not trivial to get back to the fastq counterpart...; however, on S3 they have been made available also the diploid assemblies for this human individual.
So, my question is what would be in this case the best way to go in order to build the
db meryl
? In fact, one way would be to convert bam to fastq (which is causing me some troubles), on the other hand for the same individual are also available CLR data for which they have been made available fastq files. Would it be correct to use CLR to build thedb meryl
when I then use assemblies (both maternal and paternal) obtained form HiFi data?Thanks in advance.