Closed mgalardini closed 7 years ago
Hi Marco,
Thanks for the interest. I have updated the documentation on the readme. Hopefully this explains everything, but if not just let me know. Briefly, they describe the gene orientation information, and the ++_ is just a delimiter (a bit crude).
Thanks,
Harry
Great, thanks for the quick reply, it is much clearer now!
Hi,
first of all, thanks a lot for this great tool. I am testing it on a set of ~700 E. coli genomes. Producing the
IGR_presence_absence.csv
file took approximately 14 hours using 20 CPUs.I know that the presence_absence file mimics Roary's format, but I was wondering whether you could shed light on what the information stored in each cell means.
Example:
genome_+_+_gene1_+_+_gene2_+_+_CO_R
I figured that
genome
,gene1
andgene2
represent the target genome and the genes flanking the IGR region of interest, but so far I could not figure out the meaning of the+_+
bit.Also, I noticed that at the end of the string either of the following strings can be present:
CO_F
,CO_R
,DP
,DT
,NA
Is any documentation available to understand what those notations mean?Thanks a lot for your help.