Open jelber2 opened 2 years ago
I would suggest just running FastK with different numbers of threads and seeing if the k-mer histogram are different. That should be sufficient to at least understand if the problem is with FastK (or you version thereof) or something downstream of the k-mer counting. Best, Gene
On 3/31/22, 2:44 PM, Jean Elbers wrote:
Hi,
I am not sure about this issue as I am using a non-standard installation of FastK @.*** https://github.com/davebx/FASTK/commit/305d01b81204f6870c034b9abd9d8c280d4d4b76), but maybe it applies to the current production FastK #4604bfc https://github.com/thegenemyers/FASTK/commit/4604bfcdfd9251d05b27fbd5aef38187e9a9c9ad?
Ultimately, the quality value estimate from MerquryFK is much lower when more threads are used (I have not exhaustively tried different number of threads). Note that the reads and reference (below) were made with the rust-bio-tools's bam-anonymize (https://github.com/rust-bio/rust-bio-tools) from real E coli PacBio HiFi reads aligned to an E coli reference [one that seemed rather divergent from the strain being sequenced].
get reads (~11 MB gzipped) and reference (~1.5MB gzipped)
wget https://www.dropbox.com/s/9et7bq9k4nc9cf7/anonymous-reference.fasta.gz wget https://www.dropbox.com/s/j4r7cwf6dtdi3nr/anonymous-reads2.fasta.gz
default threads
FastK -t1 -p -Nanonymous2 anonymous-reads2 MerquryFK -f -pdf -T34 -P./ anonymous2 anonymous-reference anonymous2 cat anonymous2.qv Assembly No Support Total Error % QV anonymous-reference 901 4641612 0.0005 53.1
More than default number of threads
FastK -T34 -t1 -p -Nanonymous3 anonymous-reads2 MerquryFK -f -pdf -T34 -P./ anonymous3 anonymous-reference anonymous3 cat anonymous3.qv Assembly No Support Total Error % QV anonymous-reference 137658 4641612 0.0752 31.2
Any help would be greatly appreciated. For now, I would only use the default number of cores.
— Reply to this email directly, view it on GitHub https://github.com/thegenemyers/MERQURY.FK/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUSINQNJ6UGFGSU5ZTS3GTVCWM4RANCNFSM5SFILRAQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Ok, so I have tested to see if there are differences between -T1....-T34
Generate histograms for -T1...-T34
for i in `seq 1 34`
do
FastK -T${i} -t1 -p -Nanonymous2 anonymous-reads2
Histex -h1:100 anonymous2.hist > ${i}.txt;done
done
Are the histograms different?
for i in `seq 1 34`
do
diff 1.txt ${i}.txt
done
no output
Ok, when I run Merqury.FK with 32 threads (2 minus the number of cores used by FastK), I get the "correct" QV estimate.
MerquryFK -f -pdf -T34 -P./ anonymous3 anonymous-reference anonymous3
MerquryFK -f -pdf -T33 -P./ anonymous3 anonymous-reference anonymous3-1
MerquryFK -f -pdf -T32 -P./ anonymous3 anonymous-reference anonymous3-2
cat anonymous3.qv
Assembly No Support Total Error % QV
anonymous-reference 137658 4641612 0.0752 31.2
cat anonymous3-1.qv
Assembly No Support Total Error % QV
anonymous-reference 141776 4641612 0.0775 31.1
cat anonymous3-2.qv
Assembly No Support Total Error % QV
anonymous-reference 901 4641612 0.0005 53.1
Hi,
I am not sure about this issue as I am using a non-standard installation of FastK (https://github.com/davebx/FASTK/commit/305d01b81204f6870c034b9abd9d8c280d4d4b76), but maybe it applies to the current production FastK #4604bfc?
Ultimately, the quality value estimate from MerquryFK is much lower when more threads are used (I have not exhaustively tried different number of threads). Note that the reads and reference (below) were made with the rust-bio-tools's bam-anonymize (https://github.com/rust-bio/rust-bio-tools) from real E coli PacBio HiFi reads aligned to an E coli reference [one that seemed rather divergent from the strain being sequenced].
get reads (~11 MB gzipped) and reference (~1.5MB gzipped)
default threads
More than default number of threads
Any help would be greatly appreciated. For now, I would only use the default number of threads/cores.