Closed Boer223 closed 2 years ago
You can't use the wfmash-xxxx
file as -a/--input-paf
because it is a temporary file of wfmash
that contains only the mappings, that is the regions to align, so there are no CIGAR strings in it. seqwish
warns you of this ([seqwish] WARNING: input alignment file wfmash-3TaQ4Q does not have CIGAR strings
). Moreover, it seems that such a file presents invalid information in it, which is triggering the error. Try running pggb
by using the output of wfmash
(in your case, it should be called output/wfash-3TaQ4Q.paf
).
@AndreaGuarracino
Thank you for your quickly reply! But when I use pggb -i 19-genomes.merge.fa -n 19 -o output -p 90 -s 100000 -t 5 -T 5 -M -Z
to create the pan-genome graph, it cannot generate the paf file. There is only a wfmash-3TaQ4Q temp file.
Weird, or maybe you haven't waited long enough. What does the estimated mapping and alignment time say in the log? I suggest reducing -s 50000
and waiting a bit more. If the problem persists, please share the output/...log
file.
It occurs the following log at last.
[E::fai_load3_core] Failed to open FASTA file 19-genomes.merge.fa
wfmash -X -s 100000 -p 90 -n 18 -t 16 19-genomes.merge.fa 19-genomes.merge.fa
15440.41s user 792.33s system 1172% cpu 1384.73s total 7245936Kb max memory
[E::fai_load3_core] Failed to open FASTA file 19-genomes.merge.fa
It is not able to see the FASTA file in input, very strange. Can I see your 19-genomes.merge.fa.fai
file too? And also head /home/cuixb/data/analysis_data/graph-pan-genome/pggb-result/wfmash-3TaQ4Q
?
19-genomes.merge.fa.fai file: 19-genomes.merge.fa.zip
head of wfmash-3TaQ4Q file:
Darmor_v10#1#A01 32958928 27800000 28300000 + Darmor_v10#1#C01 48239358 47247687 47879060 5741 631373 10 id:f:90.9308
Darmor_v10#1#A01 32958928 0 3800000 + Darmor_v5#1#chrC01 38829317 850 4733913 44055 4733063 12 id:f:93.0793
Darmor_v10#1#A01 32958928 27000000 29700000 + Darmor_v5#1#chrC01 38829317 35738139 38333401 25321 2700000 12 id:f:93.7809
Darmor_v10#1#A01 32958928 30500000 31200000 + Darmor_v5#1#chrC01 38829317 38267435 38823342 6814 700000 16 id:f:97.3442
Darmor_v10#1#A01 32958928 29900000 30500000 - Darmor_v5#1#chrAnn_random 48658326 1918964 2515790 5847 600000 16 id:f:97.4553
Darmor_v10#1#A01 32958928 15700000 16300000 - Darmor_v5#1#chrAnn_random 48658326 3155785 3717399 5876 600000 17 id:f:97.9259
Darmor_v10#1#A01 32958928 27800000 28300000 + Express617#1#chrC01 44118044 38888171 39510831 5664 622660 10 id:f:90.972
Darmor_v10#1#A01 32958928 28700000 29900000 + Express617#1#chrC01 44118044 40944888 42168781 11515 1223893 12 id:f:94.0823
Darmor_v10#1#A01 32958928 27800000 28300000 + FAFU_ZS11#1#chrC01 54641295 49487432 50101595 5581 614163 10 id:f:90.8653
Darmor_v10#1#A01 32958928 31200000 31900000 + FAFU_ZS11#1#chrC01 54641295 50286548 50945152 6412 700000 11 id:f:91.6069
the whole wfmash-3TaQ4Q file: wfmash-3TaQ4Q.zip
The FASTA index seems healthy. The input contains a lot of sequences, but I don't think (hope) that's the problem. Can you try it with other, but smaller FASTA files? With FASTA files in the same folder where your current input is, and also FASTA files present in other folders? I am wondering if there is an issue that is specific to your system. In each test, please also delete and regenerate the FASTA index, to be safe.
The number sequences should not have an effect here. The system has been tested into the millions of input seqs and there should not be any limit.
It seems that you can't read the FASTA.
Please confirm that these return the same value:
cat ref.fa | grep '^>' | wc -l
wc -l ref.fa.fai
On Sun, May 15, 2022, 20:48 Andrea Guarracino @.***> wrote:
The FASTA index seems healthy. The input contains a lot of sequences, but I don't think (hope) that's the problem. Can you try it with other, but smaller FASTA files? With FASTA files in the same folder where your current input is, and also FASTA files present in other folders? I am wondering if there is an issue that is specific to your system. In each test, please also delete and regenerates the FASTA index, to be safe.
— Reply to this email directly, view it on GitHub https://github.com/pangenome/pggb/issues/199#issuecomment-1126995070, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEI64OSB42Z6JPJLX4LVKFBHLANCNFSM5V5SDHDA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@ekg As you said, I have confirmed the number of sequences of the reference genome and both two files return the same value.
When I reinstall the whole environment for pggb using conda, it runs successfully without error.
Hi,
When I use the following command to build pan-genome graph with 19 genomes, it occurs error. Command:
Error: