Closed abremges closed 8 years ago
Hi, Could you send me the output of your summary statistics file? Andrew
Uh, empty (soft) core set? Is this the reason for Use of uninitialized value in require at (eval 7471) line 1.
and all the warnings?
Core genes (99% <= strains <= 100%) 0
Soft core genes (95% <= strains < 99%) 0
Shell genes (15% <= strains < 95%) 2435
Cloud genes (0% <= strains < 15%) 203
Total genes (0% <= strains <= 100%) 2638
It looks like you might have some outliers? I find Kraken ( https://ccb.jhu.edu/software/kraken/) is good for QCing data. Of if you open up the gene presence and absense spreadsheet in excel , you should be able to spot the odd one out quite easily.
On 5 February 2016 at 11:25, Andreas Bremges notifications@github.com wrote:
Uh, empty (soft) core set? Is this the reason for Use of uninitialized value in require at (eval 7471) line 1. and all the warnings?
Core genes (99% <= strains <= 100%) 0 Soft core genes (95% <= strains < 99%) 0 Shell genes (15% <= strains < 95%) 2435 Cloud genes (0% <= strains < 15%) 203 Total genes (0% <= strains <= 100%) 2638
— Reply to this email directly or view it on GitHub https://github.com/sanger-pathogens/Roary/issues/229#issuecomment-180306381 .
Thanks, will have a closer look at my data, which seems to be the cause of this issue. I now believe my input genomes are not as closely related as we thought, making them unsuitable – at least as a whole – for a Roary analysis.
As a minor enhancement, I'd propose catching this error (empty core set) in your pipeline, instead of printing somewhat obscure Use of uninitialized value
and Bio::SeqIO
warnings. This would boost usability in some edge cases.
Thanks for your input & efforts!
Yes I agree that I should be catching this case. Thanks for the feedback.
@abremges thanks for the reminder on how I fixed that! I knew I'd solved it somewhere before but I couldn't remember.
@tseemann, how do I apply this to roary? my $fh = Bio::SeqIO->new(-file=>$aln_file, -format=>'fasta', -alphabet=>'dna');
I'm having the same problem SG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet MSG: Got a sequence without letters. Could not guess alphabet
I tried to run Roary on 7 different bacterial strains (draft Illumina assemblies, annotated with Prokka). I'm not entirely sure how related the strains are, maybe we have more diversity in there than initially thought. The command-line call was
roary -v -e --mafft -p 4 *.gff 2>&1
, and everything seems to run fine until:This seems to be related to https://github.com/tseemann/snippy/issues/24, and got resolved by @tseemann by telling
Bio::SeqIO->new
the alphabet upfront. Maybe a similar strategy resolves this issue?