nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

soft-masking completes, but with some suspicous info in mask.log #941

Open igwill opened 11 months ago

igwill commented 11 months ago

Are you using the latest release? 1.8.15, installed via mamba

Describe the bug In stdout, masking seems fine:

OS: Ubuntu 20.04, 64 cores, ~ 528 GB RAM. Python: 3.8.15
Running funanotate v1.8.15
Soft-masking: running RepeatMasker with custom library
Repeat soft-masking finished: 
Masked genome: [...]/funannotate/N.Cor/RpMod_Masked_N.Cor.fasta
num scaffolds: 29
assembly size: 38,214,125 bp
masked repeats: 8,352,435 bp (21.86%)

However, when looking at mask.log, there are a couple potential issues: this version of numpy (1.24) doesn't have typeDict (I think this might have changed in v 1.12 or so), and this general.working/ dir is not found. Yet, the 'analyzing file' output looks fine? And I'm getting masking levels similar to those reported when RepeatModeler made the custom database on this genome initially.

RepeatMasker version=NA path=/[...]/envs/mamba/envs/funannotate/bin/RepeatMasker
Soft-masking: running RepeatMasker with custom library
RepeatMasker version 4.1.5
Search Engine: NCBI/RMBLAST [ 2.14.0+ ]
Using Custom Repeat Library: /[...]/funannotate/N.Cor/N.Cor-families.fa

Building general libraries in: /[...]/envs/mamba/envs/funannotate/share/RepeatMasker/Libraries//general
Traceback (most recent call last):
  File "/[...]/envs/mamba/envs/funannotate/share/RepeatMasker/famdb.py", line 55, in <module>
    import h5py
  File "/[...]/envs/mamba/envs/funannotate/lib/python3.8/site-packages/h5py/__init__.py", line 46, in <module>
    from ._conv import register_converters as _register_converters
  File "h5py/h5t.pxd", line 14, in init h5py._conv
  File "h5py/h5t.pyx", line 293, in init h5py.h5t
  File "/[...]/envs/mamba/envs/funannotate/lib/python3.8/site-packages/numpy/__init__.py", line 320, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'typeDict'
cp: cannot stat '/[...]/envs/mamba/envs/funannotate/share/RepeatMasker/Libraries//general.working/*': No such file or directory

analyzing file /[...]/funannotate/N.Cor/clean_N.Cor_genome.fasta
identifying Simple Repeats in batch 8 of 665
identifying Simple Repeats in batch 14 of 665
identifying Simple Repeats in batch 1 of 665
identifying Simple Repeats in batch 2 of 665
identifying Simple Repeats in batch 4 of 665
identifying Simple Repeats in batch 6 of 665
identifying Simple Repeats in batch 7 of 665
identifying Simple Repeats in batch 5 of 665
identifying Simple Repeats in batch 10 of 665
identifying Simple Repeats in batch 11 of 665
identifying Simple Repeats in batch 12 of 665
identifying Simple Repeats in batch 9 of 665
identifying Simple Repeats in batch 3 of 665
identifying Simple Repeats in batch 16 of 665
identifying Simple Repeats in batch 15 of 665
identifying Simple Repeats in batch 13 of 665
identifying matches to N.Cor-families.fa sequences in batch 5 of 665
identifying matches to N.Cor-families.fa sequences in batch 4 of 665
identifying matches to N.Cor-families.fa sequences in batch 10 of 665
identifying matches to N.Cor-families.fa sequences in batch 14 of 665
# .... #
processing output: 
cycle 1 ......................
cycle 2 ......................
cycle 3 ......................
cycle 4 ......................
cycle 5 
cycle 6 ......................
cycle 7 ......................
cycle 8 ....................
cycle 9 ....................
cycle 10 ....................
Generating output... ....................
masking
done

Does this seem like the masking worked OK in the end?

What command did you issue?

BuildDatabase -name $sp "clean_"$sp"_genome.fasta" # make repeatmodeler db
RepeatModeler -threads 16 -database $sp
funannotate mask -i "clean_"$sp"_genome.fasta" -o "RpMod_Masked_"$sp.fasta -m repeatmasker -l $sp"-families.fa" --cpus 16

Thanks!

hyphaltip commented 11 months ago

You could just run repeatmasker outside of funannotate and give the masked file to predict and compare. I generally have stopped running masking within funannotate as I end up combining the custom Rmodeler library + set of known repeats for the kingdom. This way I run RepeatMasker in its own conda env and the version of libraries can match its expectation anyways.