Open JDHarlingLee opened 5 years ago
Hi,
Did you have any chance to figure out this issue? I can't find any explanation about what these variations are from anywhere. I compared the sequences between some of the genes from my Roary result and their variations using local BLAST, and some of them are quite similar to their variations (>98% identity) while there are also ones that are different (no hits). So it's a mystery for now unless the developers answer the question but it seems they have been away from Roary as I can see there are many questions that are not answered for a couple of years.
I think this is related to paralogs and synteny. Any gen that is named with an underscore will be a paralog (or technically an ortholog) splitted in another COG (each COG named as _1, _2, etc) due to synteny. You can found out more here: https://github.com/sanger-pathogens/Roary/issues/357#issuecomment-361881353
@djung0928 Time ago developers decide to move from Roary to the new tool Panaroo (https://github.com/gtonkinhill/panaroo) and this is why you will find this message "PLEASE NOTE: we currently do not have the resources to provide support for Roary, so please do not expect a reply if you flag any issue" in the README section.
Hi,
Quick query on how Roary assigns names to each cluster. I notice that in my analysis (default settings), I get a number of clusters named as the same gene in slight variations - e.g. 'mecA', 'mecA_1', 'group628' (mecA), 'group629' (mecA_1). I understand the splitting paralogs function (which is a very useful tool!), but don't quite understand the difference between, say, mecA_1 and group628 (mecA), as to why one as been assigned the underscore, and another the 'group' and then a non-unique name.
Any advice/information anyone can give would be much appreciated. Many thanks.