labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
233 stars 26 forks source link

doubt about Circular contig identifiers #115

Closed msubirana closed 1 year ago

msubirana commented 1 year ago

I don't understand the 3rd parameter in organisms_fasta_list

Circular contig identifiers are indicated in the following columns I have a list of fna from Streptococcus suis:


SS_0001 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0001.fna
SS_0007 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0007.fna
SS_0008 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0008.fna
SS_0011 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0011.fna
SS_0016 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0016.fna
SS_0039 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0039.fna
SS_0042 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0042.fna
SS_0059 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0059.fna
SS_0062 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0062.fna
SS_0063 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0063.fna
SS_0064 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0064.fna
SS_0073 /mnt/synology/projects/biofar/wgs_bac/tmp/proves_core/SS_0073.fna

Which is the cicular contig in this case?

I'm not sure as well about how to use this parameter:

--anno ORGANISMS_ANNOTATION_LIST

Thanks in advance!

Jtrachsel commented 1 year ago

Hi @msubirana, you only need to worry about this field if you know you have circular contigs in your genomes.

If you know which contigs in your genomes are circular you can provide them in a tab delimited list after the path of your input fasta. You can see their example here: example file . In this example, on line 4 you can see "NC_017436.1 NC_017433.1", these are the contig identifiers for the circular contigs in this assembly.

If you have complete genomes and you know all the contigs are circular, you can use this helper function from a little R package I put together: https://github.com/Jtrachsel/pdtools#generate-an-input-file-for-caclulating-a-pangenome-with-ppanggolin

msubirana commented 1 year ago

Thanks! I'm not aware if I have circular genomes, how can I check it?

I run the pipe with the organisms_fasta_list that I showed previously and I got this error:

Exception: The gene family has not beed associated to a partition

axbazin commented 1 year ago

Hello,

For the circularity of genomes, If you downloaded them from somewhere it's likely written on the website if they are circular. If it is internal data, the best person that will know is the one that assembled the genomes. You cannot really know using a fasta file only. Overall, if you don't know, I'd recommend to just ignore that aspect, it won't change a lot of things in the end.

For the error, what command line did you use exactly ? My assumption is that you did not use the minimal command line "ppanggolin workflow --fasta ORGANISMS_FASTA_LIST" which you should likely use in your case (if you only have the 'organisms_fasta_list' file).

Adelme

ggautreau commented 1 year ago

@msubirana @axbazin May I close this issue?