nadegeguiglielmoni / GraphUnzip

Unzip assembly graphs with Hi-C data and/or long reads.
GNU General Public License v3.0
25 stars 1 forks source link

Parameters recommend for unzip polyploidy genome #7

Closed baozg closed 2 years ago

baozg commented 2 years ago

Hi, @nadegeguiglielmoni

We are working on a autoploypoidy genome, since the GraphUnzip have no assumption on the ploidy. We try to use the ONT and HiFi assembly. We choose 57.9Gb >50kb (20x) ONT reads as long reads mapping to the 3.08G hifiasm gfa b(N50 1.45Mb). But the result is larger than the estimated and worse N50 than the hifiasm gfa (3.1Gb, N50 1.45Mb).

graphaligner -x vg plant.asm.p_utg.gfa ONT.fastq > ONT.gaf

python ~/software/GraphUnzip/main.py long-reads-IM -l ONT.gaf --long_reads_IM ONT.martix -g plant.asm.p_utg.gfa -M 0.8 -w

python ~/software/GraphUnzip/main.py unzip -j ONT.martix --accept 0.40 --reject 0.10 --exhaustive -g plant.asm.p_utg.gfa -o hifi_ont_unzip.gfa

gfatools gfa2fa hifi_ont_unzip.gfa > hifi_ont_unzip.fasta
RolandFaure commented 2 years ago

Hi @baozg ,

I am surprised to see that you use two different GFAs: you use graphaligner on "plant.asm.p_utg.gfa", then you try to unzip "hifi.p_utg.gfa". Normally you should use the same GFA at the two steps. Are they the same GFA named differently ?

baozg commented 2 years ago

Hi @RolandFaure ,

Sorry for the typo. It is the same GFA.

RolandFaure commented 2 years ago

Hi @baozg ,

I cannot recommend you better parameters with what I know. I suspect that the GFA is too complicated for GraphUnzip. We have noted that GraphUnzip did not manage to untangle complicated regions for now. We will be working on that and hope to provide significant amelioration on this shortly. For now I fear that I do not have a better solution that use the hifiasm assembly.

RolandFaure commented 2 years ago

Hi @baozg, If still useful, I do recommend that you use the new version of GraphUnzip, it performs much better than previously. Re-reading this issue, I wonder if the problem wasn't that you measured the new contiguity on the GFA without using the --merge option: in any case, do not forget this option if you want to analyse directly the outputted GFA. (we'll make it default in future versions).