rmhubley / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
214 stars 48 forks source link

TypeError: 'NoneType' object is not iterable error occurs when integrating dfam3.7 database and Repbase2018 database #199

Closed panpandzf closed 1 year ago

panpandzf commented 1 year ago

Hello, An error occurs when I try to configure RepeatMasker 4.1.4 with Dfam 3.7 (h5) and RepbaseV20181026. My repeatmasker version is 4.1.4. I have updated famdb.py. I don't know why the error occurs. Any help would be greatly appreciated.

image

rmhubley commented 1 year ago

RepeatMasker 4.1.4 is shipped with Dfam 3.6. In Dfam 3.7 the file format changed and until we get a new release of RepeatMasker out (coming shortly) you will need to update the famdb.py script to use 3.7. The information on how to do that can be found here: https://www.repeatmasker.org/RepeatMasker/ but to summarize:

# For RepeatMasker 4.1.4 and previous versions this will require an extra step to upgrade the distributed famdb.py tool bundled with RepeatMasker:
% cd RepeatMasker
% mv famdb.py famdb.py.bak
% wget https://github.com/Dfam-consortium/FamDB/raw/master/famdb.py
% chmod 755 famdb.py
panpandzf commented 1 year ago

thank you for your reply!

panpandzf commented 1 year ago

Sorry, I am sure that I updated famdb.py when I ran it for the first time, then I downloaded famdb.py from the download link you sent (it is worth noting that it is displayed as v0.4.2). After re-running, it still has the same error occurred. Finally, I chose dfam3.6+repbase.

rmhubley commented 1 year ago

That's strange. When I download from that link I specified above I get:

% wget https://github.com/Dfam-consortium/FamDB/raw/master/famdb.py
--2023-03-01 17:07:36--  https://github.com/Dfam-consortium/FamDB/raw/master/famdb.py
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/Dfam-consortium/FamDB/master/famdb.py [following]
--2023-03-01 17:07:36--  https://raw.githubusercontent.com/Dfam-consortium/FamDB/master/famdb.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73352 (72K) [text/plain]
Saving to: ‘famdb.py’

100%[====================================================================>] 73,352      --.-K/s   in 0s      

2023-03-01 17:07:36 (197 MB/s) - ‘famdb.py’ saved [73352/73352]

% chmod 755 famdb.py
%  ./famdb.py -h
usage: famdb.py [-h] [-l LOG_LEVEL] [-i FILE]
                {info,names,lineage,families,family,append} ...

This is famdb.py version 0.4.3.
...
panpandzf commented 1 year ago

I am also confused, although the famdb.py version is 0.4.3, but when configuring Repeatmasker, it shows 0.4.2 image

rmhubley commented 1 year ago

Check that you placed the script in the RepeatMasker directory and that there are no other copies lying around in your path.

panpandzf commented 1 year ago

I am very sure that there is the latest version of famdb.py in the RepeatMasker directory. Before I downloaded the latest version of famdb.py, I executed "mv famdb.py famdb.py.bak".Will this be a problem?

lilinzhou commented 1 year ago

Same problem as you. I think the log information "FamDB Generator: famdb.py v0.4.2" from RepeatMasker didn't refresh.

But even if I extract the command line from famdb.py v0.4.3, it shows the same error. Is there any update for Dfam 3.7 id?

../famdb.py -i RepeatMaskerLib.h5 families --descendants 1 --curated --format fasta_name --include-class-in-name > RepeatMasker.lib Traceback (most recent call last): File "/usr/RepeatMasker-4.1.4/Libraries/../famdb.py", line 1914, in main() File "/usr/RepeatMasker-4.1.4/Libraries/../famdb.py", line 1907, in main args.func(args) File "/usr/RepeatMasker-4.1.4/Libraries/../famdb.py", line 1692, in command_families print_families(args, families, True, target_id) File "/usr/RepeatMasker-4.1.4/Libraries/../famdb.py", line 1630, in print_families entry += family.to_fasta( File "/usr/RepeatMasker-4.1.4/Libraries/../famdb.py", line 402, in to_fasta for clade_id in self.clades: TypeError: 'NoneType' object is not iterable

lilinzhou commented 1 year ago

If I just build a Repeatmasker.lib from Dfam3.7, it works well. The problem only happened after merging the Repbase and Dfam v3.7.

This command works well. ../famdb.py -i Dfam_curatedonly.h5 families --descendants 1 --curated --format fasta_name --include-class-in-name > RepeatMasker_excludeRepBase.lib

minhasbushra commented 1 year ago

encountered the same error ... have updated the famdb.py version 0.4.3. Any solution? I am looking to use the combined libraries (dfam 3.7 + Repbase)

rmhubley commented 1 year ago

I just released RepeatMasker 4.1.5 which should fix this problem. It was caused by outdated metadata in the Repbase files. In addition, this release comes with the curated Dfam 3.7 by default.

Wenwen012345 commented 1 year ago

I just released RepeatMasker 4.1.5 which should fix this problem.

I am having encountered the same error messages (RepeatMasker 4.1.5, procedure http://www.repeatmasker.org/RepeatMasker/). I would like to ask if the Repbase needs to be discarded eventually? Or do we not need to bother with the error message?