sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
323 stars 190 forks source link

Unique genes on query_pan_genome #504

Open reedus-123 opened 4 years ago

reedus-123 commented 4 years ago

Hi,

I'm trying to separate my genomes into shared and unique genes between two groups, and while running query_pan_genome with the different parameters it seems to work. I get three spreadsheets among other files - set_difference_common, and set_difference_unique for set one and set two. I was under the impression that the set_difference_unique files contained only genes that are exclusively found in that set, however when I searched the majority of genes on the opposite set and on set_difference_common, I could find them there as well, which makes me think that even though they were classified as unique, they probably aren't.

Are these genes errors that should be eliminated from the analysis, or is there a proper way of handling them?

Additionally, what is the right course of action with genes with multiple copies (i.e., one file may have gene abc, which can be found as abc_1 in group two, and as abc_2 in the shared folder)? If all of these different annotations can be found in all files, would it be correct to assume that gene abc isn't unique to one group?

Sirbius commented 4 years ago

Got the same situation as mention in the first issue, but I don't have any clue.