markziemann / dee2

Digital Expression Explorer 2 (DEE2): a repository of uniformly processed RNA-seq data
http://dee2.io
GNU General Public License v3.0
39 stars 7 forks source link

Enable users to request missing projects or samples #66

Open markziemann opened 4 years ago

markziemann commented 3 years ago

First step - find the diff of the complete metadata and the one loaded on the webserver

/mnt/md0/dee2/sradb$ comm -23 <(cut -f1 ecoli_metadata.complete.tsv.cut| sort) <(cut -f1 ecoli_metadata.tsv.cut | sort ) | grep -wFf - ecoli_metadata.complete.tsv.cut | head

cut -f5 at the end would give a list of SRP numbers.

Also need to collect email address to alert users when jobs are complete.

markziemann commented 3 years ago

working on cgi newsearch.sh

markziemann commented 3 years ago

This is the request email

Hello DEE2 user,

SRA project SRP_ACCESSION have been added to the express queue for immediate processing. For most SRA projects, this will be completed within 24 hrs but the larger the dataset the longer it will take. We will let you know by email when the requested data is available.

Best wishes,

The DEE2 team

Sed it like this

mail -s "mysubject" -r "mdz@dee2.io" "mark.ziemann@gmail.com" < mail1_template.txt

Cannot be sent from behind a firewall, so only the webserver

markziemann commented 3 years ago

Need to manage errors more gracefully

request.txt 100% 216 4.7KB/s 00:00 athaliana SRP218270 grep: SRP218293: No such file or directory grep: SRP218494: No such file or directory grep: SRP252639: No such file or directory grep: SRP268778: No such file or directory grep: SRP218682: No such file or directory grep: SRP223804: No such file or directory grep: ../sradb/athaliana: No such file or directory grep: athaliana: No such file or directory grep: athaliana: No such file or directory grep: athaliana: No such file or directory grep: celegans: No such file or directory grep: ecoli: No such file or directory grep: ecoli.csv: No such file or directory

Run 1 of 1000 wget: missing URL Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options. Starting pipeline with species athaliana and accession ERR1665206 ERR1665206 Annotated species name from NCBI SRA does not match user input! Quitting. rm: cannot remove 'fastq': No such file or directory rm: cannot remove '.sra': No such file or directory rm: cannot remove '*tsv': No such file or directory "docker cp" requires exactly 2 arguments. See 'docker cp --help'.

Usage: docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|- docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH

Copy files/folders between a container and the local filesystem grep: SRP218293: No such file or directory grep: SRP218494: No such file or directory grep: SRP252639: No such file or directory grep: SRP268778: No such file or directory grep: SRP218682: No such file or directory grep: SRP223804: No such file or directory grep: ../sradb/athaliana: No such file or directory grep: athaliana: No such file or directory grep: athaliana: No such file or directory grep: athaliana: No such file or directory grep: celegans: No such file or directory grep: ecoli: No such file or directory grep: ecoli.csv: No such file or directory Usage: grep [OPTION]... PATTERN [FILE]... Try 'grep --help' for more information. ./newrequest.sh: line 48: $SRP/genecounts.tsv: ambiguous redirect ./newrequest.sh: line 49: $SRP/genecounts.tsv: ambiguous redirect Usage: grep [OPTION]... PATTERN [FILE]... Try 'grep --help' for more information. ./newrequest.sh: line 58: $SRP/txcounts.tsv: ambiguous redirect cut: option requires an argument -- 'f' Try 'cut --help' for more information. ./newrequest.sh: line 59: $SRP/txcounts.tsv: ambiguous redirect Usage: grep [OPTION]... PATTERN [FILE]... Try 'grep --help' for more information. ./newrequest.sh: line 67: $SRP/qc.tsv: ambiguous redirect ./newrequest.sh: line 68: $SRP/qc.tsv: ambiguous redirect Usage: grep [OPTION]... PATTERN [FILE]... Try 'grep --help' for more information. cp: -r not specified; omitting directory 'SRP218270' cp: -r not specified; omitting directory 'SRP218293' cp: -r not specified; omitting directory 'SRP218494' cp: -r not specified; omitting directory 'SRP252639' cp: -r not specified; omitting directory 'SRP268778' cp: -r not specified; omitting directory 'SRP218682' zip warning: name not matched: SRP223804.zip adding: SRP218293/ (stored 0%) adding: SRP218494/ (stored 0%) adding: SRP252639/ (stored 0%) adding: SRP268778/ (stored 0%) adding: SRP218682/ (stored 0%) adding: SRP218270/ (stored 0%) adding: SRP223804/ (stored 0%) adding: SRP223804/log/ (stored 0%) SRP218270: not a regular file SRP218293: not a regular file SRP218494: not a regular file SRP252639: not a regular file SRP268778: not a regular file SRP218682: not a regular file SRP223804.zip: No such file or directory

markziemann commented 3 years ago

So the process is working from request to email, but there are some problems with the data provided

For SRP216580 there are 17 runs in SRA but in the qc data there are only 3, in the genecounts and txcounts only 4 so there is a problem with obtaining the results from the docker image