xavierdidelot / ClonalFrameML

ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes
GNU General Public License v3.0
109 stars 27 forks source link

FASTA_to_nucleotide(): unsupported base error when using .xmfa file #110

Closed Jwebster89 closed 4 years ago

Jwebster89 commented 4 years ago

I'm currently trying to use clonalframeML to remove recombination from a core gene alignment (UBCG) as suggested in issue #39 UBCG outputs an alignment of each gene which I have concatenated together, spaced by a single "=" but when i run clonalframeML (ClonalFrameML RAxML_bestTree.UBCG-raxml UBCG_genes.xmfa UBCG_CF -kappa 4.866044835 -emsim 100 -ignore_incomplete_sites true -xmfa_file true) I get the error "ERROR: FASTA_to_nucleotide(): unsupported base".

I've followed the recommendations in other issues, such as checking the multifasta blocks are alignments, checking to make sure there are no funky characters (only [AGTC-]) and I have taken subsets of the xmfa to test to make sure it isn't one sequence in particular. All to no avail.

Any help would be greatly appreciated

xavierdidelot commented 4 years ago

There must be a character somewhere in your xmfa file that ClonalFrameML does not like. When you say that you tried subsets of the xmfa to no avail, do you mean that it worked or not? Could you send me by email or post here an alignment as small as possible that causes the problem?

Jwebster89 commented 4 years ago

Hi Xavier, thank you for your response, I've emailed you a subset of the xmfa file. I've looked through all the file and can't find anything that fasta_to_nucleotide wouldn't like, but maybe it's a white space character or a formatting issue. I tried multiple subsets (I took different alignment blocks) and none of them seemed to work. So it doesn't appear to be a problem with one gene in particular. Instead it seems the problem is throughout the file. Thank you for your help

Jwebster89 commented 4 years ago

Issue solved, thank you for your help Xavier. Issue was formatting of xmfa file needed fixing (file needed to end with "=") and also one of the UBCG alignmnets had one less sample than the rest. Both of these issues led to "ERROR: FASTA_to_nucleotide(): unsupported base".