pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
365 stars 40 forks source link

Construct pangenome #313

Open Daishoulu opened 1 year ago

Daishoulu commented 1 year ago

Hi professors! I have 19 assembly genome of one species,each assembly genome contains 29 chromosomes.I want to construct a graph from the 19 genomes using PGGB. I firstly combine the 19 genome into a input.fa ,then index the input.fa using samtools , finally run the PGGB.commands are : cat 1.fasta 2.fasta ···19.fasta > input.fasta samtools faidx input.fasta pggb -i input.fasta \ -t 20 \ -s 100000\ -p 98\ -n 19\ --skip-viz \ -o ${output_dir} Now,I have encountered some problems:The genome size of this species is about 2.7g,but the gfa file contains 217417035 nodes,301626572 edges, Node length:34619282451.Why does the Genome size increase more than 10 times?

If I adopt the method of extracting chromosomes separately, how can I merge and count Nodes,Edges and Node length in the end? Looking forward with your reply!!!

AndreaGuarracino commented 1 year ago

How much are your genomes divergent? -p 98 -s 100000 might be too stringent parameters. Try lowering those values, -p 90/95 and -s <= 50000.


From: Daishoulu @.> Sent: 24 June 2023 13:06 To: pangenome/pggb @.> Cc: Subscribed @.***> Subject: [pangenome/pggb] Construct pangenome (Issue #313)

Hi professors! I have 18 assembly genome of one species,each assembly genome contains 29 chromosomes.I want to construct a graph from the18 genomes using PGGB. I firstly combine the 18 genome into a input.fa ,then index the input.fa using samtools , finally run the PGGB.commands are : cat 1.fasta 2.fasta ···18.fasta > input.fasta samtools faidx input.fasta pggb -I input.fasta -t 20 -s 100000 -p 98 -n 19 --skip-viz -o ${output_dir} Now,I have encountered some problems:The genome size of this species is about 2.7g,but the gfa file contains 217417035 nodes,301626572 edges, Node length:34619282451.Why does the Genome size increase more than 10 times?

If I adopt the method of extracting chromosomes separately, how can I merge and count Nodes,Edges and Node length in the end? Looking forward with your reply!!!

— Reply to this email directly, view it on GitHubhttps://github.com/pangenome/pggb/issues/313, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO26XHT3L3Q4CFXDDW7NWH3XM3C4LANCNFSM6AAAAAAZSOKZII. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Daishoulu commented 1 year ago

Thank you for your timely reply! I will rerun pggb using the parameters you suggested.

I want to learn about another method. Put the corresponding chromosome sequences of each reference genome into a fasta file, and I will have chr1. fasta, chr2. fasta... chr29. fasta. Assemble the sequences of each chromosome using pggb separately. The question is, how can I merge the files in the end?

AndreaGuarracino commented 1 year ago

You can use odgi squeeze to merge multiple graphs.


From: Daishoulu @.> Sent: 24 June 2023 15:22 To: pangenome/pggb @.> Cc: Andrea Guarracino @.>; Comment @.> Subject: Re: [pangenome/pggb] Construct pangenome (Issue #313)

Thank you for your timely reply! I will rerun pggb using the parameters you suggested.

I want to learn about another method. Put the corresponding chromosome sequences of each reference genome into a fasta file, and I will have chr1. fasta, chr2. fasta... chr29. fasta. Assemble the sequences of each chromosome using pggb separately. The question is, how can I merge the files in the end?

— Reply to this email directly, view it on GitHubhttps://github.com/pangenome/pggb/issues/313#issuecomment-1605490834, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO26XHVNYXP2WCJEQU62CNLXM3S2VANCNFSM6AAAAAAZSOKZII. You are receiving this because you commented.Message ID: @.***>