nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

About repeat protein database! #342

Closed sunnycqcn closed 4 years ago

sunnycqcn commented 4 years ago

Hello Jon, I can run funannotate well now. I still have a question about three repeat database: funannotate.repeats.reformat.fa, funannotate.repeat.proteins.fa.tar.gz, and repeats.dmnd. I want to use funannotate to predict the plant genome. My question is three repeat database are only for fungi or also for plant? I remember the repeat protein of plant is about 40M, which is much bigger than funannotate.repeat.proteins.fa.tar.gz. Thanks, Fuyou

nextgenusfs commented 4 years ago

Those are a semi curated set of transposable elements from an older tool transposonPSI and from makers repeat protein database. So they are not fungal specific. We (as a group) need to do better with repeat detection/curation/etc especially with RepBase no longer being free for academic use.

sunnycqcn commented 4 years ago

Jon, I got it. Thanks, Fuyou