hyunhwan-jeong / SalmonTE

SalmonTE is an ultra-Fast and Scalable Quantification Pipeline of Transpose Element (TE) Abundances
GNU General Public License v3.0
81 stars 23 forks source link

SalmonTE quant outputs empty EXPR.csv file using AWS Batch and docker image #61

Closed Xiaofei-git closed 2 years ago

Xiaofei-git commented 3 years ago

Dear @hyunhwan-jeong ,

I run SalmonTE.py quant on AWS Batch using docker image wwliao/salmonte:latest and got empty EXPR.csv, same issue as here https://github.com/hyunhwan-jeong/SalmonTE/issues/18 . I also tried to change the name but the problem does not figure out.

Could you please let me know what else information do you need to fix the issue? Do you think is it associated with SalmonTE or snakemake version?

The issue only happened to paired-end data. Actually, the status is SUCCEEDED for the job and no error reported.

If there are both CTRL_R1.fastq and CTRL_R2.fastq, it is with empty EXPR.csv

 $ rclone ls s3:s3bucketPath/example_fq_PE
   648892 CTRL_R1.fastq
   648892 CTRL_R2.fastq
$ $ rclone ls s3: s3bucketPath/salmonTE_quantOut/results_Folder
        3 EXPR.csv
    22756 clades.csv
        0 log/2021-09-24T0833.example_fq_PE.out
      478 log/2021-09-24T083328.024149.snakemake.log
      183 log/2021-09-24T083328.227077.snakemake.log
       20 phenotype.csv

If there is only CTRL_1_R1.fastq, it worked out.

$ rclone ls s3: s3bucketPath/example_fq_SR
   648892 CTRL_1_R1.fastq

$ rclone ls s3: s3bucketPath/salmonTE_quantOut/results_Folder_SR_OK
     7681 EXPR.csv
    22756 clades.csv
        0 log/2021-09-24T0731.example_fq_SR.out
      737 log/2021-09-24T073106.579400.snakemake.log
      183 log/2021-09-24T073107.372563.snakemake.log
     2771 nxf/aux_info/ambig_info.tsv
       89 nxf/aux_info/expected_bias.gz
      307 nxf/aux_info/fld.gz
      855 nxf/aux_info/meta_info.json
       54 nxf/aux_info/observed_bias.gz
       54 nxf/aux_info/observed_bias_3p.gz
      239 nxf/cmd_info.json
    10158 nxf/libParams/flenDist.txt
      468 nxf/lib_format_counts.json
     1758 nxf/logs/salmon_quant.log
    13954 nxf/quant.sf
       26 phenotype.csv  
hyunhwan-jeong commented 3 years ago

Hi @Xiaofei-git, can you share a log?

Xiaofei-git commented 3 years ago

Hi @Xiaofei-git, can you share a log?


2021-09-22 08:44:36,076 Starting quantification mode
2021-09-22 08:44:36,077 Collecting FASTQ files...
2021-09-22 08:44:36,078 The input dataset is considered as a paired-ends dataset.
2021-09-22 08:44:36,078 Collected 1 FASTQ files.
2021-09-22 08:44:36,078 Quantification has been finished.
2021-09-22 08:44:36,078 Running Salmon using Snakemake
Building DAG of jobs...
2021-09-22 08:44:36,199 Building DAG of jobs...
Using shell: /bin/bash
2021-09-22 08:44:36,215 Using shell: /bin/bash
Provided cores: 1
2021-09-22 08:44:36,215 Provided cores: 1
Rules claiming more threads will be scaled down.
2021-09-22 08:44:36,215 Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        1       collect_abundance
        2
2021-09-22 08:44:36,216 Job counts:
        count   jobs
        1       all
        1       collect_abundance
        2

2021-09-22 08:44:36,216 
rule collect_abundance:
    output: results_paired/EXPR.csv
    jobid: 1
2021-09-22 08:44:36,216 rule collect_abundance:
    output: results_paired/EXPR.csv
    jobid: 1

2021-09-22 08:44:36,216 
ESC[33mBuilding DAG of jobs...ESC[0m
ESC[33mUsing shell: /bin/bashESC[0m
ESC[33mJob counts:
        count   jobs
        1       collect_abundance
        1ESC[0m
ESC[33mComplete log: /tmp/nxf.hXy35XhRYm/.snakemake/log/2021-09-22T084436.378517.snakemake.logESC[0m
Finished job 1.
2021-09-22 08:44:36,672 Finished job 1.
1 of 2 steps (50%) done
2021-09-22 08:44:36,672 1 of 2 steps (50%) done

2021-09-22 08:44:36,673 
localrule all:
    input: results_paired/EXPR.csv
    jobid: 0
2021-09-22 08:44:36,673 localrule all:
    input: results_paired/EXPR.csv
    jobid: 0

2021-09-22 08:44:36,673 
Finished job 0.
2021-09-22 08:44:36,674 Finished job 0.
2 of 2 steps (100%) done
2021-09-22 08:44:36,674 2 of 2 steps (100%) done
Complete log: /tmp/nxf.hXy35XhRYm/.snakemake/log/2021-09-22T084436.177434.snakemake.log
2021-09-22 08:44:36,674 Complete log: /tmp/nxf.hXy35XhRYm/.snakemake/log/2021-09-22T084436.177434.snakemake.log
Xiaofei-git commented 3 years ago
$ more 2021-09-22T084436.177434.snakemake.log

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        1       collect_abundance
        2

rule collect_abundance:
    output: results_paired/EXPR.csv
    jobid: 1

Finished job 1.
1 of 2 steps (50%) done

localrule all:
    input: results_paired/EXPR.csv
    jobid: 0

Finished job 0.
2 of 2 steps (100%) done
Complete log: /tmp/nxf.hXy35XhRYm/.snakemake/log/2021-09-22T084436.177434.snakemake.log
Xiaofei-git commented 3 years ago
$ more 2021-09-22T084436.378517.snakemake.log
Building DAG of jobs...
Using shell: /bin/bash
Job counts:
        count   jobs
        1       collect_abundance
        1
Complete log: /tmp/nxf.hXy35XhRYm/.snakemake/log/2021-09-22T084436.378517.snakemake.log
hyunhwan-jeong commented 3 years ago

@Xiaofei-git, sorry for the late response. It seems that quantification has been done, but it doesn't create EXPR.csv file. Can you let me know what command line you use and where the files are stored (in local or S3)?

Thank you,

Hyun-Hwan Jeong

Xiaofei-git commented 3 years ago

@Xiaofei-git, sorry for the late response. It seems that quantification has been done, but it doesn't create EXPR.csv file. Can you let me know what command line you use and where the files are stored (in local or S3)?

Thank you,

Hyun-Hwan Jeong

Hi @hyunhwan-jeong, we used nextflow to submit AWS Batch jobs. Here is the part of code for the paired process. The files are stored/published in S3.

Thanks a lot for your help!

process TEdiscoveryPaired {
    tag "${samplePaired}"
    publishDir path: params.outputDir, saveAs: { dirname -> "$samplePaired"+'_paired_results' }, mode: 'copy', overwrite: true

    time '1h'
    memory '16 GB'
    disk '100 GB'
    cpus params.cpus
    echo false
    errorStrategy 'ignore'
    stageInMode 'symlink'
    stageOutMode 'rsync'

    input:
    val(reference) from params.reference
    tuple val(samplePaired), path(read1), path(read2) from paired_fastq_ch
    val(threads) from params.salmonTE.threads

    output:
    path('results_paired') optional true into paired_output_ch

    when:
    read1.toString().startsWith('OK_') && read2.toString().startsWith('OK_')

    script:
    """
    echo "Running SalmonTE: Paired-end reads"
    output_file="\$(date +"%Y-%m-%dT%H%M")."$samplePaired".paired.out"
    mkdir FASTQ
    mv $read1 FASTQ/
    mv $read2 FASTQ/
    SalmonTE.py quant --reference=$reference --num_threads=$threads --outpath=results_paired FASTQ/ 2> "\$output_file"
    mv .snakemake/log/ results_paired/
    mv "\$output_file" results_paired/log
    """
}
Xiaofei-git commented 3 years ago

@hyunhwan-jeong Do you know why there is no "run_salmon_fq"? The below log is what I am expecting. But, there is no "run_salmon_fq" in my log file above.

Job counts:
    count   jobs
    1   all
    1   collect_abundance
    2   run_salmon_fq
    4
2021-09-22 08:28:32,737 Job counts:
    count   jobs
    1   all
    1   collect_abundance
    2   run_salmon_fq
    4
Xiaofei-git commented 3 years ago

Here is the ".command.sh" for one of the samples from the AWS Batch:

#!/bin/bash -ue
echo "Running SalmonTE: Paired-end reads"
output_file="$(date +"%Y-%m-%dT%H%M")."tcga-bam-4".paired.out"
mkdir FASTQ
mv OK_tcga-bam-4_R1.fastq.gz FASTQ/
mv OK_tcga-bam-4_R2.fastq.gz FASTQ/
SalmonTE.py quant --reference=hs --num_threads=2 --exprtype=count --outpath=results_paired FASTQ/ 2> "$output_file"
mv .snakemake/log/ results_paired/
mv "$output_file" results_paired/log
hyunhwan-jeong commented 3 years ago

@Xiaofei-git, sorry for the late response again. I had a personal matter so I was not able to respond. Do you still have the problem? If so, would you mind creating an account on your AWS account?

Thank you,

Hyun-Hwan Jeong

Xiaofei-git commented 3 years ago

@Xiaofei-git, sorry for the late response again. I had a personal matter so I was not able to respond. Do you still have the problem? If so, would you mind creating an account on your AWS account?

Thank you,

Hyun-Hwan Jeong

Thank you so much for your reply!

I think we have fixed this issue. We built a new docker image with updated version 0.4 of SalmonTE and upgraded snakemake, and also changed the snakemake/Snakemake.paired#Line58: is to ==.

Thanks a lot!

Xiaofei