nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

Transcription factor identification #924

Closed skylerhar1 closed 1 year ago

skylerhar1 commented 1 year ago

I was looking at the file tf_interpro.txt because I was trying to generate the annotation table manually, and I noticed IPR004827 was repeated twice in this list. This could be inflating the number of transcription factors in the compare function. I am using version 1.8.14.

hyphaltip commented 1 year ago

I'm not sure how @nextgenusfs developed this file but it does seem like both are presented but I don't know if redundancy is reflected in the result counts unless the same gene is presented 2x in the heatmap / table results?

nextgenusfs commented 1 year ago

I see it is repeated. I honestly don't remember that file was created, most likely manual curation from InterPro domains I had in the particular project I was working on (I highly doubt I went through the entirety of interpro). I think it was ~6 years ago when I was working on some comparative genomics. I can just delete one of the duplicated entries. The list is certainly not complete and hasn't been updated in a long time, so I wouldn't rely on funannotate compare to generate a comprehensive list.

skylerhar1 commented 1 year ago

Makes sense to me. Thank you for the help, I'll close the issue now.