Closed mgalardini closed 7 years ago
Hi Marco,
Thanks again for the feedback. This is a bit complicated unfortunately. I use the ID as this is what Roary relies on (I think).
If Roary encounters duplicate IDs or locus_tags between isolates, it modifies the GFF files by adding a number to the end of the ID (but not the locus_tag). These files are found in 'fixed_input_files' in the Roary out dir. Because Piggy integrates the information from Roary, the gene names must be identical to the ones used by Roary otherwise it doesn't work. So I have to use Roary's ID information.
If there are duplicates between genomes (and no ID), then Roary takes the locus_tag for the first genome, and then adds an ID (with appended number to make unique) to the other genomes (but not the first). I have just tested this on your genome by renaming genome1, 2, 3 etc (Roary need 3 genomes minimum to run). So in this example genome1 has no ID, but the others have unique IDs.
I think this means if the genome has no ID then it is safe to use the locus_tag (as long as files in the fixed_input_files folder get priority - these should all have IDs).
Does this make sense?
Thanks,
Harry
Hi Harry,
I understand; I have modified my GFF file to include an ID
field equal to the locus_tag
and it worked fine. Thanks a lot for your feedback.
Best, Marco
Hi,
as reported in the previous issue I opened (#14), I'm running piggy on ~700 E. coli genomes. All but one have been annotated by Prokka. The only exception is the reference strain, E. coli K-12, for which I'm using the genbank file available from the NCBI and converted to GFF3 format using this python library.
It would appear that piggy is not picking up the gene names for the gff file obtained this way (downloadable here). If I look into the
IGR_sequences.fasta
file generated by piggy I see the following FASTA headers:genome_+_+__+_+__+_+_DP
Adding an
ID
feature to each sequence feature with value equals to thelocus_tag
seems to solve the issue, but I was wondering whether using the more commonly usedlocus_tag
attribute would make more sense.Marco