rmhubley / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
214 stars 48 forks source link

Error message + results all "unclassified" using RepBase 28.03 #215

Open MarionPerrier opened 1 year ago

MarionPerrier commented 1 year ago

Describe the issue

Hi! When running the software with the latest version of RepBase (28.03), I get this error message:

RepeatMasker version 4.1.5
Search Engine: NCBI/RMBLAST [ 2.11.0+ ]
Using Custom Repeat Library: RepBase28.03.fasta/fngrep.ref

Building general libraries in: /home/user/mambaforge/envs/funannote_repeatmod/share/RepeatMasker/Libraries//general
Traceback (most recent call last):
  File "/home/user/mambaforge/envs/funannote_repeatmod/share/RepeatMasker/famdb.py", line 55, in <module>
    import h5py
  File "/home/user/mambaforge/envs/funannote_repeatmod/lib/python3.8/site-packages/h5py/__init__.py", line 46, in <module>
    from ._conv import register_converters as _register_converters
  File "h5py/h5t.pxd", line 14, in init h5py._conv
  File "h5py/h5t.pyx", line 293, in init h5py.h5t
  File "/home/user/mambaforge/envs/funannote_repeatmod/lib/python3.8/site-packages/numpy/__init__.py", line 320, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'typeDict'
cp: cannot stat '/home/user/mambaforge/envs/funannote_repeatmod/share/RepeatMasker/Libraries//general.working/*': No such file or directory

analyzing file sort/sample1_sort.fasta
identifying Simple Repeats in batch 1 of 378
identifying Simple Repeats in batch 2 of 378
identifying Simple Repeats in batch 5 of 378
identifying Simple Repeats in batch 3 of 378
identifying Simple Repeats in batch 4 of 378
identifying Simple Repeats in batch 6 of 378
[...]

It doesn't prevent RepeatMasker from running after this error message. However, in the .tbl file, all the results found on RepBase are included in "unclassified".

I have no idea how to fix this. Any help would be greatly appreciated!

Reproduction steps

  1. Steps to reproduce the behavior, including the command lines given to the program

This is the command line I've used:

RepeatMasker -s -lib RepBase28.03.fasta/fngrep.ref -xsmall -pa 20 -dir mask/ sort/sample1_sort.fasta

Log output

This is the output output.log

TBL file:

==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements            0            0 bp    0.00 %
   SINEs:                0            0 bp    0.00 %
   Penelope:             0            0 bp    0.00 %
   LINEs:                0            0 bp    0.00 %
    CRE/SLACS            0            0 bp    0.00 %
     L2/CR1/Rex          0            0 bp    0.00 %
     R1/LOA/Jockey       0            0 bp    0.00 %
     R2/R4/NeSL          0            0 bp    0.00 %
     RTE/Bov-B           0            0 bp    0.00 %
     L1/CIN4             0            0 bp    0.00 %
   LTR elements:         0            0 bp    0.00 %
     BEL/Pao             0            0 bp    0.00 %
     Ty1/Copia           0            0 bp    0.00 %
     Gypsy/DIRS1         0            0 bp    0.00 %
       Retroviral        0            0 bp    0.00 %

DNA transposons          0            0 bp    0.00 %
   hobo-Activator        0            0 bp    0.00 %
   Tc1-IS630-Pogo        0            0 bp    0.00 %
   En-Spm                0            0 bp    0.00 %
   MULE-MuDR             0            0 bp    0.00 %
   PiggyBac              0            0 bp    0.00 %
   Tourist/Harbinger     0            0 bp    0.00 %
   Other (Mirage,        0            0 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:         1076       254964 bp    1.40 %

Total interspersed repeats:      254964 bp    1.40 %

Small RNA:               0            0 bp    0.00 %

Satellites:              0            0 bp    0.00 %
Simple repeats:       3812       161438 bp    0.88 %
Low complexity:          0            0 bp    0.00 %
==================================================

Environment (please include as much of the following information as you can find out):

RepeatMasker got installed with bioconda.

I have the version 4.1.5

I am using the last version of RepBase (28.03), in FASTA format, with -lib input.

(funannote_repeatmod) user$ uname -a
Linux user 5.19.0-40-generic #41~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 31 16:00:14 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Additional context It is my first time using RepeatMasker, so I don't have any comparison with older versions of RepeatMasker or RepBase.

MarionPerrier commented 1 year ago

Update: The error message was due to an outdated version of h5py. I updated it using pip install h5py --upgrade, and it works fine since. However, the results using the last version of RepBase still send all the results in "Unclassified" category. Am I doing something wrong in the command line?