xavierdidelot / ClonalFrameML

ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes
GNU General Public License v3.0
109 stars 27 forks source link

Question: xmfa format? #122

Closed nbawe closed 3 years ago

nbawe commented 3 years ago

I have two formats of xmfa files:

1) with #:

#Gene_name1
>Isolate1
ATG...TGC
>Isolate2
ATG...TGC
=
#Gene_name2
>Isolate1
ATG...TGC
>Isolate2
ATG...TGC
=

2) without #:

>Isolate1:1-10 + Gene_name1
ATG...TGC
>Isolate 2:1-10 + Gene_name1
ATG...TGC
=
>Isolate1:11-20 +Gene_name2
ATG...TGC
>Isolate2:11-20 + Gene_name2
ATG...TGC
=

My question are both suitable for CFML and what does ‘#’ represent - comment?

nbawe commented 3 years ago

PS! I have edited my original question!

xavierdidelot commented 3 years ago

The lines starting with # are comments and should not have any effect.

nbawe commented 3 years ago

Thank you!

nbawe commented 3 years ago

@xavierdidelot one more question. I have tree file with names like: >22275|OXC7093 but xmfa has names like: >22275|OXC7093:1-1323 + CAMP0001. I ran CFML using the mentioned files and it finished without any errors. The CFML output has names like in the tree file 22275|OXC7093.

My question is how does CFML handles the differences in the names? Does it exclude everything after colon (including colon sign)?

Thank you for the answer

nbawe commented 3 years ago

I wanted to clarify where the difference comes from. I am using pubMLST database, where I got concatenated core fasta from which I build tree with RAxML. RAxML generated tree is used in combination with pubMLST database core xmfa using CFML. So the names differ as mentioned in previous comment.

xavierdidelot commented 3 years ago

Yes, ClonaFrameML ignores everything after the colon in the fasta headers.

nbawe commented 3 years ago

Thank you!