gtonkinhill / panaroo

An updated pipeline for pangenome investigation
MIT License
271 stars 33 forks source link

gene presence/absence file having ";" without merging paralogs #257

Closed mhjonathan closed 9 months ago

mhjonathan commented 1 year ago

Hi,

I'm running the Panaroo with gff3 files without merging paralogs in strict mode.

panaroo \ --threads 20 \ --input {input} \ --out_dir {out_dir} \ --clean-mode strict \ --remove-invalid-genes \ --threshold 0.98 \ --family_threshold 0.7

And when I check gene_presence_absence.csv file, there are some queries like below (Yellow marked):

image

They are not paralogs, almost no similarity between those queries. Can you tell me what are them?

gtonkinhill commented 1 year ago

Hi,

This is usually caused by fragmented genes which Panaroo will merge together. Depending upon the reading frame they were originally called in, they can look different from the other genes in the cluster, so it is important to also consider the DNA sequence when comparing them.

mhjonathan commented 1 year ago

Hi, thank you for the answer.

Then how can I deal with this problem? I expect fragmented genes would be filtered out with --remove invalid-genes option. It can draw incorrect conclusion with clustering gene with fragmented gene that has no connection each other, right?

gtonkinhill commented 9 months ago

Hi,

Sorry, I missed your reply. The --remove-invalid-genes filters out invalid GFF entries but not fragmented gene calls. Instead, Panaroo merges these together with the ';' as a delimiter.