sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
303 stars 189 forks source link

Core genome alignment failure #542

Open Steven-Kemp opened 2 years ago

Steven-Kemp commented 2 years ago

Hi @tseemann, I see you posting often on this github, so thought I'd ask!

I'm having some issues getting a core genome alignment of around 300 full-length E.coli sequences w/ ROARY.

The program runs as it should and outputs all of the expected files, however I often get issues with the core_genome alignment being blank following the warning:

--------------------- WARNING --------------------- MSG: Got a sequence without letters. Could not guess alphabet

The summary statistics show

Core genes (99% <= strains <= 100%) 0 Soft core genes (95% <= strains < 99%) 69 Shell genes (15% <= strains < 95%) 11410 Cloud genes (0% <= strains < 15%) 99008 Total genes (0% <= strains <= 100%) 110487

I've checked the gene_presence_absence.Rtab and it looks ok to me, and I can see no obvious contamination.

Could you speculate what the issue may be?

Best wishes, Steve

Steven-Kemp commented 2 years ago

Small update, I reran this after much more carefully QC'ing the files.

Now:

Core genes (99% <= strains <= 100%) 126 Soft core genes (95% <= strains < 99%) 231 Shell genes (15% <= strains < 95%) 3745 Cloud genes (0% <= strains < 15%) 16947 Total genes (0% <= strains <= 100%) 21049

But still get the error: --------------------- WARNING --------------------- MSG: Got a sequence without letters. Could not guess alphabet