Open baozg opened 8 months ago
Ah-oh...
It seems that I am very partial to CIGAR
😆
I will fix it in a few days
Hi @baozg
I revisited my code. Then I found out that I didn't actually make a mandatory requirement for the fields after 12 columns.
That is, if you generate paf
for each row that keeps only 12 columns, this won't raise error.
The reason for the error is that some rows have 12 columns and some rows have 13 columns. I think this may require some formatting. Is this paf
file produced by minimap2?
That's what I mean since the PAF don't require a real tsv, then the different rows may have different columns. My PAF was aligned by minimap2 2.27, you can test with yeast data in the wfmash repo https://github.com/waveygang/wfmash/tree/main/data
seqkit grep -r -p "SGD" scerevisiae8.fa.gz |bgzip -@ 12 -c > SGDref.fa.gz
minimap2 -x asm5 -c --eqx SGDref.fa.gz scerevisiae8.fa.gz > test.paf
cut -f24 test.paf|grep "cg:Z"|wc -l
cut -f23 test.paf|grep "cg:Z"|wc -l
I just had the same experience as @baozg
PAFs produced by minimap2 -cx asm10
can have different numbers of columns.
In my case, three alignments were generated, but only one of them had the tag zd:i:1
And I also encounter the error:
ERROR CSV deserialize error by: CSV error: record 2 (line: 3, byte: 576): found record with 24 fields, but the previous record has 23 fields
@johnstonmj @baozg
Hey, It seems that this is a common problem. I will rethinking the logic of file reading.
Please give me a few days :)
@johnstonmj My ugly solution is only to keep the first 12 columns and the column with CIGAR. You can use simple code to filter the minimap2 paf: perl -lane 'print join("\t", @F[0..11], grep(/^cg:Z/, @F[12..$#F]))' your_file.paf
HAHA!
@baozg @johnstonmj
I fixed this bug and only added one line of code! LOL
https://github.com/wjwei-handsome/wgatools/commit/da0e5b5766d95d8fa2c3df75e7d8390088c6e1b6
Please clone the newest and try again.
Let me know if any problems. ❤️
@johnstonmj My ugly solution is only to keep the first 12 columns and the column with CIGAR. You can use simple code to filter the minimap2 paf:
perl -lane 'print join("\t", @F[0..11], grep(/^cg:Z/, @F[12..$#F]))' your_file.paf
@baozg Thanks for sharing this workaround! I didn't have a chance to try it yesterday, but I guess I won't need to because...
@wjwei-handsome The latest update works. paf2maf worked for me this time. No errors thrown and the output MAF seems good. Many thanks!
wgatools filter -f paf -a 50000
still not work with minimap2 paf
wgatools filter -f paf -a 50000
still not work with minimap2 paf
It's OK in latest ✅
Hi, Weijie
I found the
wgatools
will stop after raise an error with minimap2 PAFminimap2 -x asm20 --eqx ref.fa query.fa > query.paf
.Then I checked the PAF, only 2276/4195 records's 24 column start from
cg:Z
. According to https://github.com/lh3/miniasm/blob/master/PAF.md, only the first 12 columns is the requirement, so the aligner may choose random output of some tags.