Closed gavinmdouglas closed 3 years ago
Hi Gavin,
The ~~~
deliminater separates the gene names (found in the corresponding GFF files) of any sequences included in that cluster. If the resulting 'name' is unique this is kept otherwise the cluster is assigned a 'group_#' label.
The 3rd column is the set of unique annotations found in the GFF for these sequences. These are often longer and more descriptive than the gene names and are separated by ;
.
Unless you have enabled the --merge_paralogs
option each paralogous cluster will be given a seperate row.
I will try and improve the documentation in the next release to make this clearer as I agree it's a bit confusing at the moment.
Hi @gtonkinhill,
Thanks for clarifying, that's much clearer.
All the best,
Gavin
Hi there,
I'm not sure how to interpret lines in the gene presence/absence tables with gene names separated by "~~~".
For example:
Does this denote paralogs?
Sorry if I missed the description in the documentation. I saw in the description of this file that gene annotations that have been merged are seperated by a semicolon, which I think refers to fragmented genes that could have been misassembled in genomes (and is indicated by semi colons in the actual gene ids per genome). I think this case I've highlighted is something different, correct?
Thanks,
Gavin