Closed mushalallam closed 9 years ago
Hi, Just to double check, did you install muscle and RevTrans.py? Could you send me a directory listing (ls -alrt ) so I can get an idea about whats gone wrong. Thanks, Andrew
Hi Andrew, this how I run the command roary -e -i 70 --core_definition 90 --dont_delete_files *.gff I have Muscle and revtrans.py installed in my path, below is the ls -alrt ma11:v3 ma11$ ls -alrt total 34480 drwxr-xr-x 20 ma11 staff 680 May 27 10:39 .. -rwxr-xr-x 1 ma11 staff 2818200 May 27 10:39 NT45_03212015.gff -rwxr-xr-x 1 ma11 staff 2684389 May 27 10:39 NT224_03212015.gff -rwxr-xr-x 1 ma11 staff 2753976 May 27 10:39 NT12_03212015.gff -rwxr-xr-x 1 ma11 staff 2763286 May 27 10:39 NT11_03212015.gff -rw-r--r-- 1 ma11 staff 37095 May 27 13:10 database_masking.asnb -rw-r--r-- 1 ma11 staff 224049 May 27 13:10 _combined_files.groups -rw-r--r-- 1 ma11 staff 571707 May 27 13:10 _combined_files -rw-r--r-- 1 ma11 staff 115891 May 27 13:10 _clustered.clstr -rw-r--r-- 1 ma11 staff 381953 May 27 13:10 _clustered -rw-r--r-- 1 ma11 staff 211 May 27 13:10 blast_identity_frequency.Rtab -rw-r--r-- 1 ma11 staff 41872 May 27 13:10 _uninflated_mcl_groups -rw-r--r-- 1 ma11 staff 73 May 27 13:10 _gff_files -rw-r--r-- 1 ma11 staff 125 May 27 13:10 _fasta_files -rw-r--r-- 1 ma11 staff 604198 May 27 13:10 _blast_results -rw-r--r-- 1 ma11 staff 314397 May 27 13:10 _labeled_mcl_groups -rw-r--r-- 1 ma11 staff 288108 May 27 13:10 _inflated_unsplit_mcl_groups -rw-r--r-- 1 ma11 staff 288108 May 27 13:10 _inflated_mcl_groups -rw-r--r-- 1 ma11 staff 170 May 27 13:10 number_of_unique_genes.Rtab -rw-r--r-- 1 ma11 staff 153 May 27 13:10 number_of_new_genes.Rtab -rw-r--r-- 1 ma11 staff 200 May 27 13:10 number_of_genes_in_pan_genome.Rtab -rw-r--r-- 1 ma11 staff 200 May 27 13:10 number_of_conserved_genes.Rtab -rw-r--r-- 1 ma11 staff 413887 May 27 13:10 gene_presence_absence.csv -rw-r--r-- 1 ma11 staff 0 May 27 13:10 core_accessory.tab -rw-r--r-- 1 ma11 staff 314397 May 27 13:10 clustered_proteins -rw-r--r-- 1 ma11 staff 156 May 27 13:10 core_accessory.header.embl -rw-r--r-- 1 ma11 staff 0 May 27 13:10 accessory.tab -rw-r--r-- 1 ma11 staff 156 May 27 13:10 accessory.header.embl drwxr-xr-x 4569 ma11 staff 155346 May 27 13:13 pan_genome_sequences -rw-r--r-- 1 ma11 staff 662815 May 27 13:14 NT11_03212015.gff.proteome.faa -rw-r--r-- 1 ma11 staff 661577 May 27 13:14 NT12_03212015.gff.proteome.faa -rw-r--r-- 1 ma11 staff 646061 May 27 13:14 NT224_03212015.gff.proteome.faa -rw-r--r-- 1 ma11 staff 282267 May 27 13:14 pan_genome_results -rw-r--r-- 1 ma11 staff 677878 May 27 13:14 NT45_03212015.gff.proteome.faa drwxr-xr-x 38 ma11 staff 1292 May 27 13:16 . -rw-r--r--@ 1 ma11 staff 15364 May 27 13:18 .DS_Store -rw-r--r-- 1 ma11 staff 65 May 27 13:41 output_alignment.aln -rw-r--r-- 1 ma11 staff 65 May 27 13:50 core_gene_alignment.aln thanks
Thanks for that,
Could you email me the spreadsheet file called gene_presence_absence.csv ?
Its path-help@sanger.ac.uk as usual.
Regards,
Andrew
Hi Mushal, I've just released a new version which I 'hope' will resolve the issue your having (2.3.0). Could you give it a whirl and let me know how you get along? Andrew
Many thanks @andrewjpage its working well :)
Thanks for letting me know.
Hi @andrewjpage
--------------------- WARNING --------------------- MSG: Got a sequence without letters. Could not guess alphabet
and an alignment file the looks like this:
PRES009
PRES012
PRES014
PRES019
PRES021
PRES024
PRES025
PRES026
PRES028
The command line I used is: roary -e --mafft -p 8 -t 1 -f prokka/gffs/roary_output/ prokka/gffs/*.gff
Please advise,
I think the current version of roary is 3.14.0
That warning comes from bioperl and it usually means you have lots of -
or N
letters in your sequence.
What version of prokka did you use.
Thanks for your response,
And what is the best approach to the warning, identify and remove the sequences with many Ns? What is considered an acceptable threshold for Ns?
Kind regards.
On Sat, Feb 29, 2020 at 3:38 PM Torsten Seemann notifications@github.com wrote:
I think the current version of roary is 3.14.0 That warning comes from bioperl and it usually means you have lots of - or N letters in your sequence.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Roary/issues/127?email_source=notifications&email_token=ABGBQRXUI5V5I2TX6URMSWDRFGOAHA5CNFSM4BFPFLP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENMI5GI#issuecomment-593006233, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGBQRSY5S3ELPE3JEORXBDRFGOAHANCNFSM4BFPFLPQ .
The Lord is my shepherd, I shall not want! Psalms 23
I'm having trouble locating version 3.14.0 installer,
Thanks
On Sat, Feb 29, 2020 at 5:07 PM San Emmanuel James < sanemmanueljames@gmail.com> wrote:
Thanks for your response,
And what is the best approach to the warning, identify and remove the sequences with many Ns? What is considered an acceptable threshold for Ns?
Kind regards.
On Sat, Feb 29, 2020 at 3:38 PM Torsten Seemann notifications@github.com wrote:
I think the current version of roary is 3.14.0 That warning comes from bioperl and it usually means you have lots of - or N letters in your sequence.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Roary/issues/127?email_source=notifications&email_token=ABGBQRXUI5V5I2TX6URMSWDRFGOAHA5CNFSM4BFPFLP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENMI5GI#issuecomment-593006233, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGBQRSY5S3ELPE3JEORXBDRFGOAHANCNFSM4BFPFLPQ .
-- San Emmanuel James Skype: jsan4christ Mobile: UG +256752900304, SA +27 67 833 1444
The Lord is my shepherd, I shall not want! Psalms 23
The Lord is my shepherd, I shall not want! Psalms 23
Hi! I would also like to revive this issue. I am running roary (3.12.0) on a dataset consisting of 2170 bacterial genomes. The command I ran was the following:
roary -p 16 -e -s -n -f roary_id85-s -i 85 *gff
The process runs seemingly fine and the correct output files are generated but I get the following error message twice
--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet
---------------------------------------------------
Also the core_gene_alignment.aln file is very small (seems to consist only of one gene or so) despite the summary statistics file stating that there should be 1141 core genes.
I have previously tried to QC my genomes by running sendsketch and validated with Kraken on dubious ones. I also made a mash tree to double check and remove outliers and by removing assemblies with over 200 contigs. After this I used prokka v 1.12 for annotation (I know it's an old version). Is this error message due to low quality/high divergence among the genomes as suggested by some answers I have found or N's in the sequences or what do you think? And most importantly; how can I mitigate it? I visualised the nwk and gene_presence_absence.csv file in Phandango and I cannot see any genome behaving weirdly (eg containing very few core genes/being very divergent from the others) there.
Thank you for your help!
To follow up on this, is there anyway to identify the sequences that give rise to this error and modify them/exclude them? Since the error message was repeated twice I assume they are two? @andrewjpage @tseemann
I had this problem too, but mine was caused by a roary version issue. When I initially installed conda, I didn't add a new channel to the conda config, which caused me to use: conda install bioconda::roary
to install roary from conda's default chanel, version 3.7.0, instead of anaconda.org version of version 3.13.0. You can check your version with roary -w
. Versions 3.7.0 will encounter this problem.
You can solve this troble with that commands :
conda config --add channels conda-forge
conda config --add channels r
conda config --add channels bioconda
and then useconda install bioconda::roary
to install roary to version 3.13.0.
Hi I got this error when I try to create a core alignment Thanks