sestaton / tephra

A tool for discovering transposable elements and describing patterns of genome evolution
MIT License
30 stars 3 forks source link

Singularity yml parsing error #43

Open eganko opened 5 years ago

eganko commented 5 years ago

I've downloaded tephra_latest.sif and I'm running it on singularity 3.3.0. When I try a small test I get an immediate error:

> tephra all -c tephra_config.test1.yml
[ERROR]: 'debug' under 'all' is not defined after parsing configuration file.
         This indicates there may be a blank line in your configuration file.
         Please check your configuration file and try again. Exiting.

If I re-run the same command with no change to the config file I get a similar but different error:

> tephra all -c tephra_config.test1.yml
[ERROR]: 'trnadb' under 'all' is not defined after parsing configuration file.
         This indicates there may be a blank line in your configuration file.
         Please check your configuration file and try again. Exiting.

Each time I run the command I can get a different error with the same config. I've only made a few changes to the default yml, and I've attached the file (with a .txt on the end) in case I've done something wrong in there. Any ideas on what is going wrong?

tephra_config.test1.yml.txt

sestaton commented 5 years ago

This is odd indeed. I have found that line endings can sometimes get messed up when editing these files and it is hard to fix, but starting over is usually the solution. As a quick check, try to download the sample config file again with wget (or any method) and just change the file names to see if that helps.

Let me know if not. There may be something else going on.

Thanks.

eganko commented 5 years ago

I just grabbed a fresh copy (curl -sL -o tephra_config.yml https://git.io/v5HFq) and tried running tephra without changes (so expecting it complain about missing files). Instead I got the same parsing error:

> tephra all -c tephra_config.yml
[ERROR]: 'trnadb' under 'all' is not defined after parsing configuration file.
         This indicates there may be a blank line in your configuration file.
         Please check your configuration file and try again. Exiting.
sestaton commented 5 years ago

This is what I get when I do the same:

tephra all -c tephra_config.yml 

[ERROR]: genome file was not defined in configuration or does not exist. Check input. Exiting.

Can you show what commands you ran to pull the image and start a container?

For reference, I added a new entry to the config file in the last version and I'm thinking that is the issue. The way the YAML file is parsed it expects entries to be in a certain order, and that would cause issues with this config file and the previous Tephra version.

eganko commented 5 years ago

Two commands to get the sif:

module load singularity/3.3.0
singularity pull library://sestaton/default/tephra

and then start it with:

LC_ALL=C singularity run --bind /home/bioinf/tephra/test1:/test1 /home/bioinf/tephra/tephra_latest.sif 

the file stats: 776201147 Sep 13 15:17 tephra_latest.sif

And the version of tephra: tephra (Tephra) version 0.12.4 (/usr/local/bin/tephra).

I pulled in a new copy today: 776201147 Sep 17 11:58 tephra_latest.sif

And I'm seeing the same config error.

sestaton commented 5 years ago

That is the previous version. The latest is v0.12.5, so that is the issue. Let me make sure the latest version was pushed to Singularity Hub because that would be one explanation.

sestaton commented 5 years ago

Okay, running the commands should work now. It was my mistake. The latest image was pushed but I didn't realize you have to manually tag them as latest/default on the website, which is different than Docker. That is why running the 'pull' command was fetching the previous image.

eganko commented 5 years ago

Thanks, that fixed the config issue.

However, now I've got a different error (feel free to mark it as a separate issue):

INFO - ======== Tephra version: 0.12.5 (started at: 19-09-2019 10:12:12) ========
INFO - Configuration - Log file for monitoring progress and errors: test1.log
INFO - Configuration - Genome file:                                 Chr2.fa
INFO - Configuration - Repeat database:                             athrep.ref.fa
INFO - Configuration - Number of threads:                           10
INFO - Command - 'tephra findltrs' started at: 19-09-2019 10:12:12.
DEBUG:  tephra findltrs -g Chr2.fa -o /home/bioinf/tephra/test1/Chr2_tephra_ltrs.gff3 -c tephra_config.test1.yml --logfile test1.log --debug
Copy failed: No such file or directory at /usr/local/share/perl/5.26.1/Tephra/Command/findltrs.pm line 142.
INFO - Command - 'tephra findltrs' completed at: 19-09-2019 10:12:13.

I see three tephra_transposons_hmmdb* files, all size zero.

In the yml file I have left tnradb and hmmdb default:

  - trnadb:           TephraDB
  - hmmdb:            TephraDB

*Edit-- this seems to be same issue as https://github.com/sestaton/tephra/issues/40

sestaton commented 5 years ago

Okay, please share the output of singularity version and also the OS you are using. It looks like you might be on a compute cluster of some kind and I'm guessing there might be a network mounted file system. I'm not sure what to do about this specifically, but I'll start with the above info and go from there.

eganko commented 5 years ago

correct, I am running on a cluster - I'm running qrsh to create an interactive session (qrsh -pe smp 12), then loading singularity, then the LC... command to start tephra.

> singularity version
3.3.0

and from hostnamectl

Operating System: CentOS Linux 7 (Core)
      Architecture: x86-64
sestaton commented 5 years ago

I'm not sure what to do about the environment because I can't test that issue. I'm using the same singularity version and OS version, so the cluster environment seems to be the difference.

I recommend trying the updated approach I added to the README file to start a container and load the ENV variables. If that does not work we can debug the environment variable list I suppose.

eganko commented 5 years ago

I tried a couple of combinations of singularity shell with -C, but I'm still getting the same 'findltrs.pm line 142' error. These are the environmental variables once in singularity:

> env

LD_LIBRARY_PATH=/.singularity.d/libs
LANG=C
SINGULARITY_APPNAME=
SINGULARITY_CONTAINER=/home/bioinf/tephra/tephra_latest.sif
PWD=/home/bioinf
HOME=/home/bioinf/
SHELL=/bin/bash
TERM=linux
SINGULARITY_NAME=tephra_latest.sif
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PS1=Singularity tephra_latest.sif:\w> 
_=/usr/bin/env

Perhaps I could specify the full path to the 'TephraDB' file that needs to be copied (though I'm not sure where that is)?

sestaton commented 4 years ago

Could you try the latest version (v0.13.0)? I would like to round up some old issues, particularly the container issues.

sestaton commented 4 years ago

Specifically, the same command you ran previously is working fine for me:

singularity run --bind $PWD/test1:/test1 $PWD/tephra_latest.sif 

If this is no longer relevant, that is okay but let me know please and I'll close this issue.

eganko commented 4 years ago

Sure, with the 0.13.0 version:

INFO - ======== Tephra version: 0.13.0 (started at: 04-07-2020 11:20:25) ========
DEBUG:  tephra findltrs -g Chr2.fa -o /home/bioinf/tephra/test1/Chr2_tephra_ltrs.gff3 -c tephra_config.test1.yml --logfile test1.log --debug
Copy failed: No such file or directory at /usr/local/share/perl/5.30.0/Tephra/Command/findltrs.pm line 142.
INFO - Command - 'tephra findltrs' completed at: 04-07-2020 11:20:26.
INFO - Command - 'tephra findtrims' started at: 04-07-2020 11:20:26.

My focus is on other things currently, but I do hope to return to repeat finding in the Fall, so perhaps I can try some more in depth singularity checking then. Feel free to close this for housekeeping purposes..

sestaton commented 4 years ago

Thanks for the response. I'll leave this issue open as a reminder. The Docker version and basic install or fine, but I do intend to understand and resolve these Singularity issues.