janka2012 / digIS

Pipeline detecting distant and putative novel insertion sequences in prokaryotic genomes
MIT License
9 stars 0 forks source link

I have a question about the gff files #6

Closed Leytoncito closed 2 years ago

Leytoncito commented 2 years ago

Hello janka2012 1) Basically, the gff file will only have the IS elements?. 2) can I use these gff later with roary or panaroo? 3) I can search IS on a Panaroo or rary output file, for example the pangenome.fa (it is a fasta). 4) prokka annotates the transposases (and many other things) and generates a gff file that later I group with some tool (example roary). digIS is better for this ?.

Sorry if my questions, I am very interested in this tool. Thanks in advance!

Leytoncito commented 2 years ago

I was checking and running digIS, well I can use a genbank file from prokka as input. Then I checked prokka's ggf3 and it is different from the one produced by digIS, but I think somehow I will be able to fix this. Anyway, it would be very interesting to implement a prokka-style gff3 output (that includes the sequences).

janka2012 commented 2 years ago

Hi @Leytoncito. Thank you for your interest! I will try to answer your questions as best as I can right now, Please, see below:

  1. Yes, the gff file contains found IS elements. These might be both known IS elements or putative novel IS elements detected by digIS.
  2. Unfortunately, I am not familiar with roary and panaroo so I will need to look at this deeper.
  3. I am not sure if I understood this question properly. If you need fasta sequences of the detected IS elements, in this github repo, we provided a command on how to extract them from gff format using BEDtools: https://github.com/janka2012/digIS#getting-fasta-file-using-gff-file
  4. digIS focuses solely on IS elements and starts by searching for conserved domains of transposases. We compared digIS with only tools searching for IS elements and prokka searches only for transposases if I understand. However, I think digIS and prokka can complement each other.

Can you also share an example of gff3 file generated by prokka? I would like to look at it. Thank you.

Leytoncito commented 2 years ago

Thanks for your Answers. its seems, that prokka file contains the genes sequences at the end of the file (##FASTA), at the beginning of the file, below ## gff3-version 3, there is lines "##sequence-region ", I guess it's the genes that have more than one annotation. Here a prokka ggf3 file GCA_000159135.1.zip

Leytoncito commented 2 years ago

Hello, what do you think of the prokka gff file?. gff file allows it to be coupled to other programs