mobilomics / TEcandidates

TEcandidates
7 stars 9 forks source link

Could not execute mv trinity_out_dir.Trinity.fasta trinity_assemblies/SRR851837_filtered.trinity_assembly.fasta #10

Closed NickPanyushev closed 3 years ago

NickPanyushev commented 3 years ago

Hi! I was trying to launch TEcandidates on test data, but got the following error:

Could not execute mv trinity_out_dir.Trinity.fasta trinity_assemblies/SRR851837_filtered.trinity_assembly.fasta'
/home/common/adonin.ls/.programs/TEcandidates_v2.0.2/TEcandidates.sh: line 313: Could not execute mv trinity_out_dir.Trinity.fasta trinity_assemblies/SRR851837_filtered.trinity_assembly.fasta: No such file or directory

I suppose that there is the inconsistency in variable trinityOutput or trinity_out_dir substitution in the script.

Thanks in advance for your help!

The full log is available by the link https://pastebin.com/QNqzwC0Y

bvaldebenitom commented 3 years ago

Hi @NickPanyushev !

can you share the output of "ls -lRht" on the output folder used during the pipeline? Also, which version of Trinity are you using?

NickPanyushev commented 3 years ago

Hi, @bvaldebenitom!

I use Trinity-v2.11.0.

Here is the full "ls -lRht" output. In addition to TEcandidates files there are some with filename starting with slurm- - those are slurm cluster log files. TEcandidates.log - Those are the logs from TEcandidates launches.

total 28G
-rw-r--r--. 1 nikolai calccommon 9,1K дек 25 19:51 TEcandidates.log
-rw-r--r--. 1 nikolai calccommon  57M дек 25 19:51 dm3_BT2.rev.1.bt2
-rw-r--r--. 1 nikolai calccommon  39M дек 25 19:51 dm3_BT2.rev.2.bt2
-rw-r--r--. 1 nikolai calccommon  57M дек 25 19:43 dm3_BT2.1.bt2
-rw-r--r--. 1 nikolai calccommon  39M дек 25 19:43 dm3_BT2.2.bt2
-rw-r--r--. 1 nikolai calccommon 332K дек 25 19:36 dm3_BT2.3.bt2
-rw-r--r--. 1 nikolai calccommon  39M дек 25 19:36 dm3_BT2.4.bt2
-rw-r--r--. 1 nikolai calccommon  215 дек 25 19:36 slurm-1079175.out
-rw-r--r--. 1 nikolai calccommon  215 дек 25 19:11 slurm-1079174.out
-rw-r--r--. 1 nikolai calccommon  222 дек 25 15:58 slurm-1079160.out
-rw-r--r--. 1 nikolai calccommon  12K дек 25 15:27 TEcandidates.log3
-rw-r--r--. 1 nikolai calccommon  225 дек 25 15:22 slurm-1079118.out
-rw-r--r--. 1 nikolai calccommon  11K дек 14 15:39 TEcandidates.log2
-rw-r--r--. 1 nikolai calccommon  215 дек 14 15:34 slurm-977270.out
-rw-r--r--. 1 nikolai calccommon 9,1K дек 14 01:08 TEcandidates.log1
drwxr-xr-x. 3 nikolai calccommon    5 дек 14 01:08 candidateTE_analysis_coverage-0.3_length-900_N-1
-rw-r--r--. 1 nikolai calccommon  215 дек 14 01:03 slurm-977028.out
-rw-r--r--. 1 nikolai calccommon  327 дек 14 01:03 TEcandidates.sh
-rw-r--r--. 1 nikolai calccommon  15G дек 12 23:40 SRR851838.fastq
-rw-r--r--. 1 nikolai calccommon  13G дек 12 23:16 SRR851837.fastq
-rw-r--r--. 1 nikolai calccommon 165M апр 30  2020 dm3.fasta
-rw-r--r--. 1 nikolai calccommon 5,2M апр 30  2020 dm3_rmsk_TE.gff3

./candidateTE_analysis_coverage-0.3_length-900_N-1:
total 1,0K
drwxr-xr-x. 2 nikolai calccommon   0 дек 14 01:08 trinity_assemblies
-rw-r--r--. 1 nikolai calccommon   0 дек 13 22:51 SRR851838_filtered.fastq
-rw-r--r--. 1 nikolai calccommon 223 дек 13 22:51 SRR851838.bt2_summary
-rw-r--r--. 1 nikolai calccommon   0 дек 13 20:45 SRR851837_filtered.fastq
-rw-r--r--. 1 nikolai calccommon 222 дек 13 20:45 SRR851837.bt2_summary

./candidateTE_analysis_coverage-0.3_length-900_N-1/trinity_assemblies:
total 0

Hope this helps!

bvaldebenitom commented 3 years ago

Hi @NickPanyushev

there were indeed some bugs. Thanks for reporting this.

Please try with v2.0.3, available here

Let me know if anything else comes up regarding this issue.

NickPanyushev commented 3 years ago

Hi @bvaldebenitom

Thank you very much for your help! Yesterday I launched the new version and it worked up to the end. But after I reviewed the log I saw these errors:

Error: Unable to open file trinity_assemblies/*.trinity_assembly.bed. Exiting. Error: Unable to open file allcandidates_coverage-0.3_length-900_N-1.gff3. Exiting.

Also I couldn't see the folder with results :(

total 28G
-rw-r--r--. 1 user calccommon  331 Jan  4 00:12 TEcandidates.sh
-rw-r--r--. 1 user calccommon 450K Jan  4 00:28 TEcandidates_203.log
-rw-r--r--. 1 user calccommon  15G Dec 12 23:40 SRR851838.fastq
-rw-r--r--. 1 user calccommon  13G Dec 12 23:16 SRR851837.fastq
-rw-r--r--. 1 user calccommon    0 Jan  4 00:21 repeatsToMask_coverage-0.3_length-900.gff3
-rw-r--r--. 1 user calccommon 5,2M Apr 30  2020 dm3_rmsk_TE.gff3
-rw-r--r--. 1 user calccommon  39M Jan  4 00:28 dm3.fasta.masked_BT2.rev.2.bt2
-rw-r--r--. 1 user calccommon  57M Jan  4 00:28 dm3.fasta.masked_BT2.rev.1.bt2
-rw-r--r--. 1 user calccommon  39M Jan  4 00:22 dm3.fasta.masked_BT2.4.bt2
-rw-r--r--. 1 user calccommon 332K Jan  4 00:22 dm3.fasta.masked_BT2.3.bt2
-rw-r--r--. 1 user calccommon  39M Jan  4 00:24 dm3.fasta.masked_BT2.2.bt2
-rw-r--r--. 1 user calccommon  57M Jan  4 00:24 dm3.fasta.masked_BT2.1.bt2
-rw-r--r--. 1 user calccommon 165M Jan  4 00:21 dm3.fasta.masked
-rw-r--r--. 1 user calccommon 165M Apr 30  2020 dm3.fasta

The full log is accessible here

bvaldebenitom commented 3 years ago

Hi @NickPanyushev

I checked the log file, and the errors are reported as /home/common/adonin.ls/.programs/TEcandidates_v2.0.2/TEcandidates.sh: line 200: SRR851837.bt2_summary: No such file or directory

I assume you are in the /home/common/adonin.ls/.programs/TEcandidates_v2.0.2/ directory, executing still the older version. What is the exact command you are using now? Where do you have the TEcandidates_v2.0.3.sh file?

NickPanyushev commented 3 years ago

@bvaldebenitom, Yes, I've obtained the newest (2.0.3) version of TEcandidates.sh. I just replaced the old executable with the newest one in the same folder. So, the name of the folder have not been changed.

TEcandidates executables are located at the

The exact command still the same as in the thread starter message (except the name of the log file ): TEcandidates.sh -t=64 -r=32 -c=0.3 -l=900 -te=dm3_rmsk_TE.gff3 -g=dm3.fasta -fq=. -m=SE -N=1 &> TEcandidates_203.log

javibio-git commented 3 years ago

Hi @NickPanyushev and @bvaldebenitom

I was having issues trying to run TEcandidates so I posted them here but then I figure out what was the problem (so I deleted my post).

Initially, I though that there was a compatibility issue with Trinity v2.11 so I spent a lot of time trying to compile the v2.4 used here. Once I was able to compile it, I still had the exact same issue with Trinity. Then I looked at the log file carefully and noticed that bowtie2 was not printing any # of processed reads for one of the samples. I then inspected the fastq file and found that it was corrupted (not showing the reads but still having a regular fastq file size (Gb)). I deleted that file and re-run the fastq-dump to download it again and made sure the file was ok. After that I was able to run TEcandidates_v2.0.3.sh, even using Trinity-v2.11.

I have to say that I installed the specified samtools, bowtie2 and bedtools versions using conda. For Trinity I compiled it and added the directory to the path manually.

Hope this helps. If you need more details, let me know.

Javier

bvaldebenitom commented 3 years ago

@NickPanyushev can you, please, double check your version update? In the log file, I see error on line 200, but that line is blank on the current version.

I'm re-running the pipeline now, trying to follow as closely how you executed it last time, and inspecting everything. However, before publishing the new update, I already tested it. This brings me to my next point: it can be a good idea, to test it in a new directory (or just delete the candidateTE_analysis_coverage-0.3_length-900_N-1 with rm -Rf). I speculate that some of the conditions in the script are not working well in this situation.

@javibio-git I got a notification from GitHub, but figured you solved it, as I was unable to find the post in here. I have stumbled on many problems (not just with TEcandidates) due to improper fastq-dump. It is always a good idea to recheck that too. From time to time, I just prefer to use [https://sra-explorer.info/](SRA explorer) to download the data in a more careful way. Thanks for using TEcandidates! If anything else comes up, please let us know.

FWIW, the latest TEcandidates has been tested with Trinity v2.11. I will update that on the main page, because, as you saw, v2.4 can be a bit complicated to compile.

NickPanyushev commented 3 years ago

@bvaldebenitom, I've reinstalled the TEcandidates from the scratch and the error persisted. After relaunch I separated the stderr and stdout and could see the possible source of the error.

/usr/bin/time -f '%E real\n%U user\n%S sys\n%K memory' -o trinity_assemblies/SRR851837_filtered.time Trinity --seqType fq --max_memory 32G --CPU 16 --bflyHeapSpaceMax 2G --bflyCPU 16 --single /home/common/adonin.ls/TEcandidates_test/candidateTE_analysis_coverage-0.3_length-900_N-1/SRR851837_filtered.fastq --full_cleanup
/home/common/adonin.ls/.programs/TEcandidates_v2.0.3/TEcandidates.sh: line 289: /usr/bin/time: No such file or directory

I think TEcandidates tries to move Trinity-generated files and therefore fails. In fact they are absent, because Trinity fails on startup.

bvaldebenitom commented 3 years ago

@NickPanyushev it seems Trinity is not even starting.

Can you share the output of each of the following commands? echo ${BASH_VERSION} /usr/bin/time --version Trinity --version

I know you are using the latest version of Trinity, but need to check if its correctly recognized from the command line.

By the way, I finished the test run in my computer, and it works ok. I found a small detail with your previous command: TEcandidates.sh -t=64 -r=32 -c=0.3 -l=900 -te=dm3_rmsk_TE.gff3 -g=dm3.fasta -fq=. -m=SE -N=1 The number of threads cannot be higher than the number of RAM. This was causing an error, but that was not the issue in your previous reply. In your latest reply it seems you already noticed this, and correctly solved it.

NickPanyushev commented 3 years ago

@bvaldebenitom, thanks for your help!

$ echo ${BASH_VERSION}
4.2.46(2)-release

$ /usr/bin/time --version
GNU time 1.7

$ Trinity --version
Trinity version: Trinity-v2.11.0
** NOTE: Latest version of Trinity is v2.11.0, and can be obtained at:
    https://github.com/trinityrnaseq/trinityrnaseq/releases

When I've found out the /usr/bin/time causes an error in the script, I commented out its invocation and relaunched TEcandidates.

I've just modified this if block to make Trinity launch:

if [ "$mode" == "SE" ]; 
  then
    #cmd="/usr/bin/time -f \"%E real\n%U user\n%S sys\n%K memory\" -o $timeOutput Trinity --seqType fq --max_memory $RAM --CPU $CPU --bflyHeapSpaceMax $bflyHeapMax --bflyCPU $CPU --single $readFile --full_cleanup"
    cmd="Trinity --seqType fq --max_memory $RAM --CPU $CPU --bflyHeapSpaceMax $bflyHeapMax --bflyCPU $CPU --single $readFile --full_cleanup"
    echo "CMD: $cmd"
    echo -e "\n"
    Trinity --seqType fq --single $readFile --max_memory $RAM --CPU $CPU --bflyHeapSpaceMax $RAM --bflyCPU $CPU --full_cleanup
    #/usr/bin/time -f "%E real\n%U user\n%S sys\n%K memory" -o $timeOutput Trinity --seqType fq --max_memory $RAM --CPU $CPU --bflyHeapSpaceMax $bflyHeapMax --bflyCPU $CPU --single $readFile --full_cleanup
  else

And I got this new error message in log:

which: no java in (/home/common/adonin.ls/.programs/trinityrnaseq-v2.11.0/trinity-plugins/BIN:/home/common/adonin.ls/.programs/perl5/perlbrew//bin:/usr/local/bin:/usr/local/sbin:/usr/sbin:/usr/bin:/home/common/adonin.ls/.local/bin:/home/common/adonin.ls/.programs/bedtools2/bin:/home/common/adonin.ls/.programs/kallisto:/home/common/adonin.ls/.programs/sratoolkit.2.9.6-1-ubuntu64/bin:/home/common/adonin.ls/.programs/jellyfish/bin:/home/common/adonin.ls/.programs/samtools-1.11:/home/common/adonin.ls/.programs/hmmer-3.3.1/src:/home/common/adonin.ls/.programs/RepeatMasker:/home/common/adonin.ls/.programs/hmmer-3.3.2/src:/home/common/adonin.ls/.programs/cellranger-5.0.0:/home/common/adonin.ls/.programs/perl5/bin:/home/common/adonin.ls/.programs/salmon-1.4.0/bin:/home/common/adonin.ls/.programs/bowtie2-2.4.2:/home/common/adonin.ls/.programs/trinityrnaseq-v2.11.0:/home/common/adonin.ls/.programs/TEcandidates_v2.0.3:/home/common/adonin.ls/.programs/xclip:/home/common/adonin.ls/.local/bin:/home/common/adonin.ls/bin)
Error, cannot find 'java'.  Please be sure it is available within your ${PATH} setting and then try again. at /home/common/adonin.ls/.programs/trinityrnaseq-v2.11.0/Trinity line 2905.

But of course, I have java in my PATH:

$ java -showversion
openjdk version "1.8.0_275"
OpenJDK Runtime Environment (build 1.8.0_275-b01)
OpenJDK 64-Bit Server VM (build 25.275-b01, mixed mode)
...
$ which java
/usr/bin/java
bvaldebenitom commented 3 years ago

@NickPanyushev no worries, and thanks for making the change of /usr/bin/time. I was suspicious about it in your situation.

Just so we are clear, do you have the *_filtered FASTQ files in the TEcandidates run folder?

You previously mentioned to be using Slurm. Can you post the slurm err and out files, resulting from the run using sbatch of a simple script having the following?

echo ${PATH}
java -showversion
which java

Previously, I have needed to put explicitly the export PATH declaration at the beggining of the script in order for it to properly run on a Slurm cluster.

If possible, can you also try to run the Trinity command directly at the terminal? Trinity --seqType fq --max_memory 32G --CPU 32 --bflyHeapSpaceMax 1G --bflyCPU 32 --single /home/common/adonin.ls/TEcandidates_test/candidateTE_analysis_coverage-0.3_length-900_N-1/SRR851837_filtered.fastq --full_cleanup You just need to see if it throws the same error, and afterwards you can just stop the process.

NickPanyushev commented 3 years ago

Hi, @bvaldebenitom.

It seems I solved the problem with java. I've just executed the module load java/13 command before running the script. So, Slurm can correctly locate java now. I ran this sbatch script

echo ${PATH} 1> paths_test.log 2> paths_test.err
echo >> paths_test.log 
echo >> paths_test.err

java -showversion 1>> paths_test.log 2>> paths_test.err
echo >> paths_test.log 
echo >> paths_test.err

which java 1>> paths_test.log 2>> paths_test.err   

and got error and stdout files. As you can see, java is located correctly.

Next, I ran this Trinity --seqType fq --max_memory 32G --CPU 32 --bflyHeapSpaceMax 1G --bflyCPU 32 --single /home/common/adonin.ls/TEcandidates_test/candidateTE_analysis_coverage-0.3_length-900_N-1/SRR851837_filtered.fastq --full_cleanup

And it yielded this stdout. So, Trinity can start now.

But when I started the whole script, it bailed out with error. But I haven't seen this error before.

TEcandidates_203.log TEcandidates_203.err

It looks like some internal Trinity error, I suppose...

NickPanyushev commented 3 years ago

I've finally overcame all errors and got TEcandidates successfully finished on test data! @bvaldebenitom, lots of thanks for your support!

It truly was the internal trinity error caused by its improper installation. So, if you see this error in the log ~/.programs/trinityrnaseq-v2.11.0/util/..//Inchworm/bin/fastaToKmerCoverageStats: No such file or directory you have to recompile trinity.

NB! Despite TE candidates checks if the Trinity is installed, it cannot handle this error by itself! So check its installation with Trinity-supplied test data.

If you are trying to launch TEcandidates on a cluster, make sure you have loaded java and python modules, as they are needed for Trinity.

Also, Trinity reqiures the latest samtools. I would like to suggest including the samtools version check in the beginning of the script as the improvement. May be it also worth adding the checks of java and numpy.

Unfortunately, I couldn't solve the problem with /usr/bin/time, so I had to comment it out.

And the latest improvement to suggest is to fix broken links in Readme.md, because test files are not available now for download.

Thank you very much for this program and for your assistance!

bvaldebenitom commented 3 years ago

Hi @NickPanyushev ,

apologies for the delay in replying, but I'm glad it is working now.

I have taken note to all of the issues you mention. I'm working in a major update of the tool, so all of your comments are really appreciated. These additional verifications will help for it to run smoothly.

If anything else comes up, please do not hesitate in posting here, or writing directly to the email!