Open kfuku52 opened 3 years ago
I def have this bug too- I have not worked on the workaround - but if you want to provide the option to run v1 or v2 that would be better -- I am just running RM outside of funannotate for the time being but I think having this better rolled in would be helpful
I'm not sure I want to support RepeatModeler/Masker -- I've not upgraded the code for several years in relation to this because install is not straightforward as well as the RepBase database is no longer available, making masking (at least for fungi and other non-model organisms) difficult. I sort of view it as out of scope with funannotate.
I don't have the RepBase subscription but RepeatMasker is still useful because the latest version comes with the out-of-the-box Dfam repeat database (see here), so Repbase is no longer necessary.
bioconda provides the recipes for both, so currently it's pretty easy to install if you don't have to manually compile it from the source code: conda install -c bioconda repeatmasker repeatmodeler
. bioconda's latest RepeatModeler is 2.x.
Right, DFAM contains repeats of only 5 species.... so for my usage (fungi) it isn't helpful.
The current release (Dfam 3.2) contains 6,900 TE families spanning five organisms: human, mouse, zebrafish, fruit fly, nematode, and a growing number of additional species. To supplement this databases we recommend obtaining the RepeatMasker edition of RepBase.
RepeatMasker seems to automatically download the latest Dfam release, which is currently 3.3 and contains 347 species, although I don't know how many fungal species there are.
(funannotate) [kfuku@at137 gfe_data]$ find ~/.pyenv/versions/miniconda3-4.3.30/envs/funannotate -name *Dfam*
/home/kfuku/.pyenv/versions/miniconda3-4.3.30/envs/funannotate/include/H5FDfamily.h
/home/kfuku/.pyenv/versions/miniconda3-4.3.30/envs/funannotate/share/RepeatMasker/Libraries/Dfam.h5
/home/kfuku/.pyenv/versions/miniconda3-4.3.30/envs/funannotate/share/RepeatMasker/Libraries/CONS-Dfam_3.3
Yeah, I'm not saying it isn't useful if you are annotating human, mouse, zebrafish, etc --> but most of us are trying to annotate non-model organisms so the DFAM repeat library isn't going to be very useful. RepeatModeler used to also require RepBase library in order to do the de novo predictions, I don't know if that is still the case or not.
Edit: I didn't read your message closely -- 347 species -- I thought I looked at this awhile ago and there were very few if any fungi, but perhaps worthwhile looking again.
Looks like you can browse species here: https://dfam.org/browse?clade=4890&clade_descendants=true&include_raw=true
I'm going to go ahead and try and get repeatmodeler 2.x + repeatmasker 4.1 series working so refer to those branch fixes to this bug here.
Are you using the latest release? yes, 1.8.8.
Describe the bug At the moment,
funannotate mask
does not seem to support RepeatModeler 2.x. This is because theRepeatModeler -e
option in 1.x has been replaced byBuildDatabase -engine
, This can be easily fixed like in this branch. https://github.com/kfuku52/funannotate/commit/9bad1296fc337049b87a94127f92bdd0fdeea186 (This branch also fixes a bug where --repeatmasker_species is not passed correctly to RepeatMasker.)I'll be happy to create a PR, but the change in my branch loses compatibility with RepeatModeler 1.x. It seems possible to get version information from RepeatModeler's help messages to support both 1.x and 2.x. So if the 1.x support is still needed, I can update my branch so and create a PR. Please let me know your thought.