markziemann / dee2

Digital Expression Explorer 2 (DEE2): a repository of uniformly processed RNA-seq data
http://dee2.io
GNU General Public License v3.0
39 stars 7 forks source link

Singularity image #11

Closed markziemann closed 6 years ago

markziemann commented 7 years ago

Need a singularity image for use on HPC systems

markziemann commented 6 years ago

for Ubuntu 16.04 Install singularity like this.

sudo wget -O- http://neuro.debian.net/lists/xenial.us-ca.full | sudo tee /etc/apt/sources.list.d/neurodebian.sources.list
sudo apt-key adv --recv-keys --keyserver hkp://pool.sks-keyservers.net:80 0xA5D32F012649A5A9
sudo apt-get update
sudo apt-get install -y singularity-container

then check that it work singularity --version

Now convert docker image to singularity according to this script (link here)


docker run \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /scratch/mziemann/dee2/image/singularity:/output \
--privileged -t --rm \
singularityware/docker2singularity \
mziemann/tallyup

You'll see that there's a new .img file

Now check that the image works singularity run mziemann_tallyup-2017-10-17-9d3621e9da95.img /root/code/volunteer_pipeline.sh ecoli

It launched the app which is great, but encountered some errors.

+ MY_ORG=ecoli
+ '[' '!=' -f ']'
**/root/code/volunteer_pipeline.sh: line 8: [: !=: unary operator expected**
+ MEM_FACTOR=2
+ export -f main
+ cd /home/mziemann
+ echo Dumping star genomes from memory
Dumping star genomes from memory
++ find /home/mziemann/ref/
++ grep '/ensembl/star$'
++ sed 's#\/code\/\.\.##'
**find: '/home/mziemann/ref/': No such file or directory**
++ free
++ awk '$1 ~ /Mem:/  {print $2-$3}'
+ MEM=114331784
++ nproc
+ NUM_CPUS=32
++ lscpu
++ grep MHz
++ sort -k2gr
++ awk '{print $NF}'
+ CPU_SPEED=2299.998
+ ACC_URL=https://vm-118-138-241-34.erc.monash.edu.au/acc.html
+ ACC_REQUEST=https://vm-118-138-241-34.erc.monash.edu.au/cgi-bin/acc.sh
+ SFTP_URL=118.138.241.34
+ '[' '!' -z ecoli ']'
++ echo 'athaliana celegans dmelanogaster drerio ecoli hsapiens mmusculus rnorvegicus scerevisiae'
++ tr ' ' '\n'
++ grep -wc ecoli
+ ORG_CHECK=1
+ '[' 1 -ne 1 ']'
++ echo 'athaliana        2853904
celegans        2652204
dmelanogaster   3403644
drerio  14616592
ecoli   1576132
hsapiens        28968508
mmusculus       26069664
rnorvegicus     26913880
scerevisiae     1644684'
++ grep -w ecoli
++ awk -v f=2 '{print $2*f}'
+ MEM_REQD=3152264
+ '[' 3152264 -gt 114331784 ']'
+ '[' -z ecoli ']'
+ echo ecoli
ecoli
+ export -f myfunc
+ export -f key_setup
+ TESTFILE=test_pass
+ '[' '!' -r test_pass ']'
+ echo Initial pipeline test with E. coli dataset
Initial pipeline test with E. coli dataset
+ '[' -d /home/mziemann/data/ecoli/SRR057750 ']'
+ main ecoli SRR057750
+ set -x
+ export -f exit1
+ ORG=ecoli
+ '[' SRR057750 '!=' -f ']'
+ SRR_FILE=SRR057750
++ basename SRR057750 .sra
+ SRR=SRR057750
+ echo SRR057750
SRR057750
+ wget -O tmp.html https://www.ncbi.nlm.nih.gov/sra/SRR057750
--2017-12-01 03:17:27--  https://www.ncbi.nlm.nih.gov/sra/SRR057750
Resolving www.ncbi.nlm.nih.gov (www.ncbi.nlm.nih.gov)... 130.14.29.110, 2607:f220:41e:4290::110
Connecting to www.ncbi.nlm.nih.gov (www.ncbi.nlm.nih.gov)|130.14.29.110|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'tmp.html'

tmp.html                                                                [    <=>                                                                                                                                                             ]  60.38K  92.0KB/s    in 0.7s    

2017-12-01 03:17:29 (92.0 KB/s) - 'tmp.html' saved [61825]

++ echo ecoli
++ cut -c2-
+ ORG2=coli
++ sed 's/class=/\n/g' tmp.html
++ grep Organism:
++ grep -c coli
+ ORG_OK=1
+ '[' 1 -ne 1 ']'
+ echo User input species and SRA metadata match. OK.
User input species and SRA metadata match. OK.
+ cd /home/mziemann
+ DEE_DIR=/home/mziemann
+ CODE_DIR=/home/mziemann/code
+ PIPELINE=/root/code/volunteer_pipeline.sh
++ md5sum /root/code/volunteer_pipeline.sh
++ cut -d ' ' -f1
+ PIPELINE_MD5=2bd405470c8024bdd37cf5a1c428c6d8
+ SW_DIR=/home/mziemann/sw
+ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/mziemann/sw
+ DATA_DIR=/home/mziemann/data/ecoli
+ REF_DIR=/home/mziemann/ref
+ QC_DIR=/home/mziemann/qc
+ DISKLIM=32000000
+ DLLIM=1
+ ALNLIM=2
+ MEMALNLIM=4
++ nproc
+ THREADS=32
++ df .
++ awk 'END{print$4}'
+ DISK=9360
++ free
++ awk '$1 ~ /Mem:/  {print $2-$3}'
+ MEM=114331680
+ '[' 9360 -lt 32000000 ']'
**+ echo Error low disk space 9360 available 32000000 limit**
Error low disk space 9360 available 32000000 limit
+ exit1
+ rm '*fastq' '*.sra' '*tsv'
rm: cannot remove '*fastq': No such file or directory
rm: cannot remove '*.sra': No such file or directory
rm: cannot remove '*tsv': No such file or directory
+ return 1
+ TEST_CHECKSUM=a739998e33947c0a60edbde92e8f0218
+ cd /home/mziemann/data/ecoli/
/root/code/volunteer_pipeline.sh: line 1327: cd: /home/mziemann/data/ecoli/: No such file or directory
++ cat 'SRR057750/SRR057750*tsv'
++ md5sum
++ awk '{print $1}'
cat: 'SRR057750/SRR057750*tsv': No such file or directory
+ TEST_DATASET_USER_CHECKSUM=d41d8cd98f00b204e9800998ecf8427e
+ '[' d41d8cd98f00b204e9800998ecf8427e '!=' a739998e33947c0a60edbde92e8f0218 ']'
+ echo 'Test dataset did not complete properly. Md5sums do not match those provided!'
Test dataset did not complete properly. Md5sums do not match those provided!
+ echo 'Contact the author for help or flag this issue on the GitHub repo'
Contact the author for help or flag this issue on the GitHub repo
+ exit 1

So there are a few problems I can see

markziemann commented 6 years ago

tried the above on a computer with a larger HDD and it worked


+ TEST_DATASET_USER_CHECKSUM=a739998e33947c0a60edbde92e8f0218
+ '[' a739998e33947c0a60edbde92e8f0218 '!=' a739998e33947c0a60edbde92e8f0218 ']'
+ echo 'Test dataset completed successfully'
Test dataset completed successfully

There was one minor error:


++ cut -f2 SRR057750.se.tsv
++ tail -n +5
++ numsum
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = "en_AU:en",
    LC_ALL = (unset),
    LANG = "en_AU.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
markziemann commented 6 years ago

Seems the perl error was just a warning and the default locale works OK. Upon followup I found that an error occurred after processing of the data just before upload. It could not find the guestuser ssh key probably because it is stored under /root and no files there are NOT allowed to be modified in singularity. The solution is to relocate those files outside of the /root directory.

+ rm -rf '*fastq'
+ cd ..
+ COMPLETE=1
+ '[' 1 -eq 1 ']'
+ key_setup
+ mkdir -p /root/.ssh
+ cat
/root/code/volunteer_pipeline.sh: line 1078: /root/.ssh/guestuser: Permission denied
+ cat
/root/code/volunteer_pipeline.sh: line 1108: /root/.ssh/guestuser.pub: Permission denied
+ chmod -R 700 /root/.ssh
chmod: changing permissions of '/root/.ssh': Operation not permitted
chmod: changing permissions of '/root/.ssh/known_hosts': Operation not permitted
+ cd /root/data/ecoli
+ zip -r SRR1051510.ecoli.zip SRR1051510
    zip warning: name not matched: SRR1051510
markziemann commented 6 years ago

Got this to work by binding tmp and home directory like this singularity run -B /scratch/mziemann/:/tmp -H /scratch/mziemann/:/home/mziemann/ mziemann_tallyup-2017-12-03-95a2303d5deb.img