Closed sebhtml closed 10 years ago
./scripts/add-comp.pl -t /space2/seb/tmp -d /space2/seb/bin jgi_rqc
[seb@sal assembly]# ./scripts/add-comp.pl -t /space2/seb/tmp -d /space2/seb/bin jgi_rqc Installing jgi_rqc... Cloning into 'jgi-rqc-pipeline'... remote: Counting objects: 3033, done. remote: Compressing objects: 100% (1510/1510), done. remote: Total 3033 (delta 1480), reused 3033 (delta 1480) Receiving objects: 100% (3033/3033), 160.73 MiB | 5.10 MiB/s, done. Resolving deltas: 100% (1480/1480), done.
next: identify the entry point for rqc
mkdir -p destination ; rm -rf tmp; ./scripts/add-comp.pl -t tmp -d destination jgi_rqc
mkdir -p destination ; rm -rf tmp; ./scripts/add-comp.pl -t tmp -d $(pwd)/destination jgi_rqc
[seb@sal readqc]# ls
lib readqc.py readqc_report.py tools
[seb@sal readqc]# ./readqc.py
Traceback (most recent call last):
File "./readqc.py", line 70, in
./destination/jgi_rqc/readqc/readqc.py --fastq ~/dropbox/GPIC.1424-1.1371.fastq --output-path output-1 --kmer 63
`-bash---python---sh---perl---cat
it is running now...
[seb@sal assembly]# ls output-1/ readqc.log readqc_stats.tmp readqc_status.log subsample uniqueness
[seb@sal assembly]# grep -i fail output-1/readqc.log os_utility.py :31732 2014-05-30 16:33:15,734 INFO: cmd: set -e; cat /home/seb/dropbox/GPIC.1424-1.1371.fastq | Failed to find 'cplusmersampler' installation. os_utility.py :31732 2014-05-30 16:33:15,742 INFO: Return values: exitCode=127, stdOut=, stdErr=/bin/sh: 1: Failed: not found readqc_utils.py:31732 2014-05-30 16:33:15,742 ERROR: - fail to sample unique 20 mers. readqc.py :31732 2014-05-30 16:33:15,742 INFO: 2_unique_mers_sampling failed. readqc.py :31732 2014-05-30 16:33:15,743 INFO: Status 2_unique_mers_sampling failed
what is cplusmersampler ?
The asset (private) is now available here (executable, no source code):
To install:
seb@bigmem:~/kbase-stuff/assembly$ rm -rf destination; mkdir -p destination ; rm -rf tmp; ./scripts/add-comp.pl -t tmp -d $(pwd)/destination jgi_rqc
To use:
./destination/jgi_rqc/readqc/readqc.py --fastq ~/dropbox/GPIC.1424-1.1371.fastq --output-path output-1 --kmer 63
Success:
seb@bigmem:~/kbase-stuff/assembly$ ./destination/jgi_rqc/readqc/readqc.py --fastq ~/dropbox/GPIC.1424-1.1371.fastq --output-path output-1 --kmer 63 Started readqc pipeline, writing log to: output-1/readqc.log seb@bigmem:~/kbase-stuff/assembly$ find output-1/ output-1/ output-1/uniqueness output-1/readqc_status.log output-1/readqc_stats.tmp output-1/readqc.log output-1/subsample output-1/subsample/GPIC.1424-1.1371.s0.01.fastq output-1/subsample/first_subsampled.txt output-1/subsample/GPIC.1424-1.1371.stats
The program still does not find its own program:
seb@bigmem:~/kbase-stuff/assembly$ grep cplusmersampler output-1/readqc.log os_utility.py :887 2014-06-11 17:46:59,609 INFO: cmd: set -e; cat /home/seb/dropbox/GPIC.1424-1.1371.fastq | Failed to find 'cplusmersampler' installation.
seb@bigmem:~/kbase-stuff/assembly$ find destination/|grep cplusmersampler destination/jgi_rqc/readqc/tools/cplusmersampler
seb@bigmem:~/kbase-stuff/assembly$ file destination/jgi_rqc/readqc/tools/cplusmersampler destination/jgi_rqc/readqc/tools/cplusmersampler: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped
Will continue this tomorrow.
OK...
Setting the PATH does not work.
seb@bigmem:~/kbase-stuff/assembly$ export PATH=$(pwd)/destination/jgi_rqc/assets/:$PATH
seb@bigmem:~/kbase-stuff/assembly$ which cplusmersampler /home/seb/kbase-stuff/assembly/destination/jgi_rqc/assets//cplusmersampler
seb@bigmem:~/kbase-stuff/assembly$ ./destination/jgi_rqc/readqc/readqc.py --fastq ~/dropbox/GPIC.1424-1.1371.fastq --output-path output-1 --kmer 63
os_utility.py :9059 2014-06-13 16:01:13,766 INFO: cmd: set -e; cat /home/seb/dropbox/GPIC.1424-1.1371.fastq | Failed to find 'cplusmersampler' installation.
there are some tests that ship with the product:
seb@bigmem:~/kbase-stuff/assembly$ destination/jgi_rqc/lib/os_utility.py &> log
Ran 2 tests in 0.015s
OK Failed to find 'blast' installation.
Failed to find 'blast' installation.
Failed to find 'blast' installation.
Failed to find 'blast' installation.
Failed to find 'blast+' installation.
Failed to find 'agrep' installation.
Failed to find 'tagdust' installation.
Failed to find 'gnuplot' installation.
Failed to find 'bwa' installation.
Failed to find 'fastq_to_fasta_qual' installation.
Failed to find 'cat' installation.
Failed to find 'bzcat' installation.
Failed to find 'zcat' installation.
Failed to find 'head' installation.
Failed to find 'tail' installation.
Failed to find 'perl' installation.
Failed to find 'grep' installation.
Failed to find 'nawk' installation.
Failed to find 'xxx' installation.
Failed to find 'fastqTrimmer' installation.
Failed to find 'duk' installation.
Failed to find 'fastqQhist' installation.
Failed to find 'histo_parse.pl' installation.
Failed to find 'rm' installation. -f Failed to find 'mkdir' installation. -p Failed to find 'cplusmersampler' installation.
Failed to find 'fq2fa.pl' installation.
Failed to find 'GCcontent.pl' installation.
Failed to find 'histogram2.pl' installation.
Failed to find 'histo_parse.pl' installation.
Failed to find 'checkIllQualLANLfq.sh' installation.
I think it is broken.
It seems getToolPath() in lib/os_utility2.py is used for getting the path, and there are many hardcoded paths the function looks for the executable.
Looking at readqc/lib/readqc_constants.py, I feel we will have many more dependencies in the form of hardcoded reference databases (/global/dna/shared/rqc/ref_databases/*).
On Jun 13, 2014, at 11:03 AM, Sébastien Boisvert notifications@github.com wrote:
OK...
Setting the PATH does not work.
seb@bigmem:~/kbase-stuff/assembly$ export PATH=$(pwd)/destination/jgi_rqc/assets/:$PATH
seb@bigmem:~/kbase-stuff/assembly$ which cplusmersampler /home/seb/kbase-stuff/assembly/destination/jgi_rqc/assets//cplusmersampler
seb@bigmem:~/kbase-stuff/assembly$ ./destination/jgi_rqc/readqc/readqc.py --fastq ~/dropbox/GPIC.1424-1.1371.fastq --output-path output-1 --kmer 63
os_utility.py :9059 2014-06-13 16:01:13,766 INFO: cmd: set -e; cat /home/seb/dropbox/GPIC.1424-1.1371.fastq | Failed to find 'cplusmersampler' installation.
— Reply to this email directly or view it on GitHub.
This should be fixed upstream in my opinion.
I agree. It needs to be deployable.
On Jun 13, 2014, at 11:22 AM, Sébastien Boisvert notifications@github.com wrote:
This should be fixed upstream in my opinion.
— Reply to this email directly or view it on GitHub.
The product needs this dependency:
This Python code is not compatible with Ubuntu because:
Python's Popen has an option to use the shell. Default is '/bin/sh'. On Fedora / CentOS / RHEL, /bin/sh is bash (/bin/sh -> bash). On Ubuntu, it is dash (/bin/sh -> dash)
The problem is that modules is not compatible with dash.
Workaround:
try to use this:
Popen(['/bin/bash', '-c', args[0], args[1], ...])
command = ['/bin/bash', '-c', 'module load cplusmersampler && which cplusmersampler']
But to have access to module, it is required to have access to /packages/modules/3.2.9-1/Modules/3.2.9/init/bash
I'll patch the code so that module is not a requirement...
The code is looking for 'command not found', but dash just says 'not found'.
seb@bigmem:~/kbase-stuff/assembly$ dash -c command-12345678 dash: 1: command-12345678: not found seb@bigmem:~/kbase-stuff/assembly$ bash -c command-12345678 bash: command-12345678: command not found
running test.
Still does not work after patching...
It is in my PATH: seb@bigmem:~/kbase-stuff/assembly$ which cplusmersampler /home/seb/kbase-stuff/assembly/destination/jgi_rqc/assets//cplusmersampler
The unit test finds the executable: seb@bigmem:~/kbase-stuff/assembly$ destination/jgi_rqc/lib/os_utility.py | grep cplus
Ran 2 tests in 0.016s
OK /home/seb/kbase-stuff/assembly/destination/jgi_rqc/readqc/tools/cplusmersampler
seb@bigmem:~/kbase-stuff/assembly$ grep cplus output-1/readqc.log os_utility.py :21452 2014-06-13 17:19:09,884 INFO: cmd: set -e; cat /home/seb/dropbox/GPIC.1424-1.1371.fastq | Failed to find 'cplusmersampler' installation.
Oh, there are 2 copies of the os_library Python code:
seb@bigmem:~/kbase-stuff/assembly$ find destination/|grep os_utility|grep py$ destination/jgi_rqc/lib/os_utility2.py destination/jgi_rqc/lib/os_utility.py
seb@bigmem:~/kbase-stuff/assembly$ sha1sum destination/jgi_rqc/lib/os_utility.py destination/jgi_rqc/lib/os_utility2.py b62fb21da0f2f1e0589b920cc238b8e7fd9f34ad destination/jgi_rqc/lib/os_utility.py 8e557f60a306577e7369c7252ba1ca7e8d0538e6 destination/jgi_rqc/lib/os_utility2.py
I supppose that readqc uses os_utility2.py while the unit test uses os_utility.py.
Yup ;-).
I need to patch os_utility2 too.
seb@bigmem:~/kbase-stuff/assembly$ grep os_utility2 destination/* -R|grep -v Binary destination/jgi_rqc/readqc/readqc.py: 20130415 5.0.8: Cleanup; os_utility2.py; destination/jgi_rqc/readqc/lib/readqc_constants.py:from os_utility2 import getToolPath destination/jgi_rqc/lib/rqc_constants.py:from os_utility2 import getToolPath destination/jgi_rqc/lib/rqc_constants.py:from os_utility2 import getToolPath
seb@bigmem:~/kbase-stuff/assembly$ destination/jgi_rqc/lib/os_utility2.py | grep cplus
Ran 0 tests in 0.000s
OK Failed to find 'cplusmersampler' installation.
cplusmersample now works.
new error: Failed to find 'gnuplot' installation
15 steps:
## 1. fast_subsample_fastq_sequences
## 2. write_unique_20_mers
## 3. generate read GC histograms: illumina_read_gc
## 4. read_quality_stats
## 5. write_base_quality_stats
## 6. illumina_count_q_score
## 7. illumina_calculate_average_quality
## 8. illumina_find_common_motifs
## 9. illumina_run_bwa
## 10. illumina_run_tagdust
## 11. illumina_detect_read_contam
## 12. illumina_sciclone_analysis
## 13. illumina_read_megablast
## 14. multiplex_statistics
## 15. end_of_read_illumina_adapter_check
It fails at step 9:
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/init.py", line 846, in emit
msg = self.format(record)
File "/usr/lib/python2.7/logging/init.py", line 723, in format
return fmt.format(record)
File "/usr/lib/python2.7/logging/init.py", line 464, in format
record.message = record.getMessage()
File "/usr/lib/python2.7/logging/init.py", line 328, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Logged from file readqc.py, line 625
chmod: cannot access `output-2/log': No such file or directory
Traceback (most recent call last):
File "./destination/jgi_rqc/readqc/readqc.py", line 1495, in
The tool creates the directory output-2/output-2/log/, but then uses output-2/log/
Test result after adding 3rd patch:
Crashing at 12_illumina_sciclone_analysis
It is trying to connect to a MySQL server:
^CTraceback (most recent call last):
File "./destination/jgi_rqc/readqc/readqc.py", line 1544, in
The upstream code has too many dependencies and is not properly documented.
rolling QC (RQC) pipeline