oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
335 stars 73 forks source link

Some errors in using version 1.9.7 #168

Closed biozhangzhou closed 3 years ago

biozhangzhou commented 3 years ago

Hello. Sir I have updated EDTA to 1.9.7 as you Suggest. Then some errors come: First: I rerun the analysis based on the old results, apparently it cannot combine the new an old results.

Then I try to rerun the job without any old results, Second error come: "what(): Resource temporarily unavailable terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable terminate called after throwing an instance of 'std::system_error'" BTW: I give EDTA 40 threads.

Third: I try to install EDTA in a HPC which fails in RMblast installing problem. It seems that I cann't install EDTA in HPC throngh the conda ?

oushujun commented 3 years ago

No, you can't use old results for 1. 9+ directly, but there is a conversation script to update results from 1. 8x to 1. 9, Please check out the release note.

The second issue suggest that you have claimed too many cpus that the system can't handle. Maybe you are running multiple jobs?

You can install in HPC with conda. Please specify what command you were using and better with platform and conda version information.

Best, Shujun

On Sat, Feb 27, 2021 at 2:09 PM biozhangzhou notifications@github.com wrote:

Hello. Sir I have updated EDTA to 1.9.7 as you Suggest. Then some errors come: First: I rerun the analysis based on the old results, apparently it cannot combine the new an old results. Then I try to rerun the job without any old results, Second error come: "what(): Resource temporarily unavailable terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable terminate called after throwing an instance of 'std::system_error'" I only give it 40 threads. Third: I try to install EDTA in a HPC which fails in RMblast installing problem. It seems that I cann't install EDTA in HPC throngh the conda ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/168, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NEY7SMJSMCTUJBBE7DTBCEAFANCNFSM4YJTO6CQ .

biozhangzhou commented 3 years ago

Thank you very much for a fast relpy. Here is some detail: 1#:(EDTA) zhangzhou@mu01:EDTA$ cat /proc/version Linux version 3.10.0-957.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Nov 8 23:39:32 UTC 2018

2#:(EDTA) zhangzhou@mu01:EDTA$ conda -V conda 4.9.2

3#:Install command 1, conda create -n EDTA 2, conda install -c conda-forge -c bioconda edta python=3.6 tensorflow=1.14 'h5py<3' 3, mamba install -c conda-forge -c bioconda edta python=3.6 tensorflow=1.14 'h5py<3'

4#: run EDTA command (EDTA) zhangzhou@mu01:test$ perl ../EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib ../database/rice6.9.5.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 1

########################################################

Extensive de-novo TE Annotator (EDTA) v1.9.7
Shujun Ou (shujun.ou.1@gmail.com)

########################################################

Sat Feb 27 14:30:54 CST 2021 Dependency checking: Error: The RMblast engine is not installed in RepeatMasker! I have spent few hours in debug this Error , but it still here.

oushujun commented 3 years ago

Please install RepeatMasker separately in this env. conda has a recipe.

Shujun

On Sat, Feb 27, 2021 at 2:33 PM biozhangzhou notifications@github.com wrote:

Thank you very much for a fast relpy. Here is some detail: 1#:(EDTA) zhangzhou@mu01:EDTA$ cat /proc/version Linux version 3.10.0-957.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 https://github.com/oushujun/EDTA/pull/1 SMP Thu Nov 8 23:39:32 UTC 2018

2#:(EDTA) zhangzhou@mu01:EDTA$ conda -V conda 4.9.2

3#:Install command 1, conda create -n EDTA 2, conda install -c conda-forge -c bioconda edta python=3.6 tensorflow=1.14 'h5py<3' 3, mamba install -c conda-forge -c bioconda edta python=3.6 tensorflow=1.14 'h5py<3'

4#: run EDTA command (EDTA) zhangzhou@mu01:test$ perl ../EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib ../database/rice6.9.5.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 1

######################################################## Extensive de-novo TE Annotator (EDTA) v1.9.7 Shujun Ou ( shujun.ou.1@gmail.com)

########################################################

Sat Feb 27 14:30:54 CST 2021 Dependency checking: Error: The RMblast engine is not installed in RepeatMasker! I have spent few hours in debug this Error , but it still here.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/168#issuecomment-787015256, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NFHSSHGDQL3NJNTE5DTBCG2XANCNFSM4YJTO6CQ .

biozhangzhou commented 3 years ago

Sorry ,I think I was wrong. What I mean with HPC actually refer to Computer Cluster.

when I seperately installed Repeatmask again, it still happened of "Error: The RMblast engine is not installed in RepeatMasker!" Then i try to re-configure RepeatMaker, another error happeded: "Building FASTA version of RepeatMasker.lib ...ERROR:main:Error reading file: Unable to open file (file locking disabled on this file system (use HDF5_USE_FILE_LOCKING environment variable to override), errno = 38, error message = 'Function not implemented') "

I usually use repeatmask with hmmer which works well in cluster. Is rmblast the only choice of EDTA ? In the other side, is there a way to fix the error above?

oushujun commented 3 years ago

EDTA works on single nodes only, if you are asking MPI. You may assign as many CPUs as you get to EDTA in a single node, but not multiple nodes for a single job. However, you may split the job of EDTA_raw into multiple subjobs and have them run in parallel nodes, if after all, that may save some time.

EDTA uses RepatMasker with RMblast. You need to double-check your installation. I just tried on my end to install RepeatMasker from https://anaconda.org/bioconda/repeatmasker, and it was smooth and working well.

Best, Shujun

biozhangzhou commented 3 years ago

Thank you at all. I think I fixed those errors mentioned above. I just want to wait jobs to complete and give you a feedback.

1, For the "Resource temporarily unavailable" error, I have reboot my computer but it doesn't work. Then I try to downgrade ETDA from 1.9.7 to 1.9.6, it works.

2, For the Running in HPC(computer cluster with lustre file system), I think it's unable to use Rmblast 2.10 which will use HDF5 (not really clear about this) file in lustre system. Then I try to run the job in a tmpdisk disk (which have a seperate file system) of the system , it works.

oushujun commented 3 years ago

The first issue is interesting. It might not be what I thought it was. I will further investigate and let you know.

For the second issue, I have not encountered it myself. How did you find out the cause? Seems like there is a discussion here, have you tried? https://forum.hdfgroup.org/t/turn-off-file-locking/3809

biozhangzhou commented 3 years ago

Yes , I have tried it in many ways using "HDF5_USE_FILE_LOCKING=FALSE", but all of that doesn't work. I just not sure did I put the command in the rigth place. What's interesting is this environment sets do will overcome "cann't open lock file" error but will fall in another error(sorry I have forgot it).

oushujun commented 3 years ago

You may open an issue in the Repeatmasker site: https://github.com/rmhubley/RepeatMasker for the HDF5 issue.

oushujun commented 3 years ago

One more note on the RMBlast issue: It seems that RepeatMasker could break without a sign. It once worked and later it broke. This issue happens to me periodically. If you see something like:

Search engine ( ) is unknown to RepeatMasker. Please check the RepeatMaskerConfig.pm or rerun the configure script!. RepeatMasker version 4.1.0

Or like:

Sat Feb 27 14:30:54 CST 2021 Dependency checking: Error: The RMblast engine is not installed in RepeatMasker!

That means RepeatMasker broke on you too. You may try to reinstall RepeatMasker with --force-reinstall or other fancy debugging methods. My frank suggestion is to remove the entire conda env and rebuild a fresh one. This can save so much time instead of hours of debugging. So:

conda env remove -n EDTA conda env create -f your_path_to/EDTA/EDTA.yml

sunnycqcn commented 2 years ago

Hi Shujun, I think this issue is very interesting. I met the same issue. When I rebuild a fresh one, it should work well. But it can not work well again when I run the next job. I have to rebuild again. I do not know this issue is your pipeline or repeatmasker. Anyway, we can work well by rebuilding the fresh conda enveroniment. Best, Fuyou

sunnycqcn commented 2 years ago

Hi Shujun, Maybe your pipeline have a little bug about RepeatMasker test. In fact, I only remove out these code about test RepeatMasker. In fact, the pipeline works well. Some people said the problem is about perl environment. I am not sure. Best, Fuyou

oushujun commented 2 years ago

Hi Fuyou @sunnycqcn,

If you were referring to the RepeatMasker issue, I would need a reproducible case to debug. For example, what part of the RepeatMasker test code was causing the problem?

Thanks, Shujun

sunnycqcn commented 2 years ago

Hello Shujun, I just only delete these lines: # RepeatMasker my $rand=int(rand(1000000)); chomp ($repeatmasker=which RepeatMasker 2>/dev/null) if $repeatmasker eq ''; $repeatmasker =~ s/\s+$//; $repeatmasker = dirname($repeatmasker) unless -d $repeatmasker; $repeatmasker="$repeatmasker/" if $repeatmasker ne '' and $repeatmasker !~ /\/$/; die "Error: RepeatMasker is not found in the RepeatMasker path $repeatmasker!\n" unless -X "${repeatmasker}RepeatMasker"; cp $script_path/database/dummy060817.fa ./dummy060817.fa.$rand; my $RM_test=${repeatmasker}RepeatMasker -e ncbi -q -pa 1 -no_is -norna -nolow dummy060817.fa.$rand -lib dummy060817.fa.$rand 2>/dev/null; die "Error: The RMblast engine is not installed in RepeatMasker!\n" unless $RM_test=~s/done//gi; rm dummy060817.fa.$rand* 2>/dev/null; In fact, the pipeline worked well. I can not find where it is an error. Best, Fuyou

oushujun commented 2 years ago

Hi Fuyou @sunnycqcn,

Can you check what version of RepeatMasker you are using?

Thanks, Shujun

sunnycqcn commented 2 years ago

My version is RepeatMasker version 4.1.1 based on the EDTA.yml file. Best, Fuyou

On Sun, Nov 21, 2021 at 12:08 PM Shujun Ou @.***> wrote:

Hi Fuyou @sunnycqcn https://github.com/sunnycqcn,

Can you check what version of RepeatMasker you are using?

Thanks, Shujun

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/168#issuecomment-974866089, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3JCKGPYRHPAUXOV44GWO3UNEYRBANCNFSM4YJTO6CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Fuyou Fu, Ph.D. Saskatoon Research Center Agriculture Agri-Food Canada Cananda

oushujun commented 2 years ago

@sunnycqcn sorry, can you find out their versions through: RepeatMasker -v and ~/path-to-your-working-edta/EDTA.pl -v

Thanks, Shujun

sunnycqcn commented 2 years ago

EDTA.pl -v

########################################################

Extensive de-novo TE Annotator (EDTA) v1.9.9
Shujun Ou @.***)

########################################################

I just updated it. RepeatMasker -v RepeatMasker version 4.1.2-p1 Thanks, Fuyou

On Sun, Nov 21, 2021 at 12:21 PM Shujun Ou @.***> wrote:

@sunnycqcn https://github.com/sunnycqcn sorry, can you find out their versions through: RepeatMasker -v and ~/path-to-your-working-edta/EDTA.pl -v

Thanks, Shujun

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/168#issuecomment-974868161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3JCKDQNLIWLLNLD4VJIR3UNE2D3ANCNFSM4YJTO6CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Fuyou Fu, Ph.D. Saskatoon Research Center Agriculture Agri-Food Canada Cananda

oushujun commented 2 years ago

Hi Fuyou,

I updated RepeatMasker to v4.1.2-p1 and tested EDTA. It worked normally. Can you test the following for me: conda activate EDTA cp ..../EDTA/database/dummy060817.fa ./ RepeatMasker -e ncbi -q -pa 1 -no_is -norna -nolow dummy060817.fa -lib dummy060817.fa

Please paste the Repeatmasker STDOUT in this thread, thanks!

Shujun