Closed colindaven closed 3 years ago
Hi, thank for your interest to yacrd.
As I understand it, out of 1.9m reads, only 454k are NotBad and can therefore be used in further analyses ?
To a first approximation I would say yes.
But it is possible that the recommended parameter is a bit to strict for your data.
Based on your message I guess your data is genomic nanopore R9.4 of rat
What is your coverage? What is your error rate?
Thanks for that.
Coverage is about 3X, I have another one at about 7X too though.
It's ONT 9.4.1, the accuracy is about 92% from memory.
Ok it's clearer now, the recommended parameters were determined on datasets with coverage around 30x and 60x, I will add this information in the readme thanks for the bug report.
If you just want detect chimera I think you should run:
minimap2 -x {corresponding preset} {your other parameter} reads.fq reads.fq > overlap.paf
yacrd -i overlap.paf -o reads.yacrd
I don't think run scrubbing reads datasets with such a low coverage rate is a good idea. There is already not enough data for an assembly, reduce information isn't efficient. But if you want to try I think you should lower the minimum coverage to 1 -c 1
.
Ok, thanks. I'll just do the chimeric read detection. Certainly this was just a Minion test, I won't be performing assemblies on these datasets.
With this low coverage I think yacrd can generate some false positive. There is a high chance that a region of the genome is sequenced only once, yacrd can't made difference between this type of read and chimera.
If you have a good reference genome I think map reads on reference is a best way to detect chimera. Alvis should help you, don't trust yacrd result present in publication they made a little mistake :smiley:.
If you have any other question please ask.
Hi,
so the tool ran easily - thanks - but I am a little concerned with the results.
wc -l *.yacrd 1964840 iddm_report.yacrd
grep -c Chimeric iddm_report.yacrd 114108
grep -c NotBad iddm_report.yacrd 454940
grep -c NotCov iddm_report.yacrd 1395792
As I understand it, out of 1.9m reads, only 454k are NotBad and can therefore be used in further analyses ? From work to date with the unfiltered data (WGS Rat, just genomic alignments), I think most reads are pretty decent.
Or should I be happy with the NotCov reads ?
Commands: