hyunhwan-jeong / SalmonTE

SalmonTE is an ultra-Fast and Scalable Quantification Pipeline of Transpose Element (TE) Abundances
GNU General Public License v3.0
81 stars 23 forks source link

Quant mode seg fault due to erroneous functional call following salmon quant #75

Open michaelSkaro opened 1 year ago

michaelSkaro commented 1 year ago

The quantification mode results in a seg fault error. This is due to calling a function that is designed for your test mode. The EXPR.csv is not made in quant mode. The call to aggregate the quant.sf files is not necessary but is called under your rule all. To fix this error you can modify the current DAG into two DAGs. The quant DAG and the test mode DAG. The quant dag does not need to make the

join(OUTPUT_DIR,"EXPR.csv"), join(OUTPUT_DIR,"MAPPING_INFO.csv")

files. To avoid the non-zero

You do not need the run salmon function to call the opening of EXPR.csv bc it will not be made in this step:

def run_salmon(param): import snakemake snakefile = os.path.join(os.path.dirname(file), "snakemake/Snakefile.paired" if param["paired"] else "snakemake/Snakefile.single")

snakemake.snakemake(
    snakefile=snakefile,
    config={
        "input_path": param["inpath"],
        "output_path": param["--outpath"],
        "index": param["--reference"],
        "salmon": os.path.join(os.path.dirname(__file__),"salmon/{}/bin/salmon"),
        "num_threads" : param["--num_threads"],
        "exprtype": param["--exprtype"],
    },
    quiet=True,
    lock=False
)

with open(os.path.join(param["--outpath"], "EXPR.csv" ), "r") as inp:
    sample_ids = inp.readline().strip().split(',')[1:]
with open(os.path.join(param["--outpath"], "condition.csv" ), "w") as oup:
    oup.write("SampleID,condition\n")
    oup.write(
        "\n".join([s+","+"NA" for s in sample_ids]) + "\n"
    )

Remove the calls to this in your run salmon function and remove the call for the EXPR to be made in your paired end mode. Two solutions; make a try-catch that accepts the mode as a string to look for in the arguments OR make a new function that collates the quan.sf files into an EXPR.csv only in test mode as the first step. Push that into the R script that calls the DEseq2fromExpresisonMatrix input.

I personally think you should try making the ladder function bc it is a bit more deliberate and far cleaner to read that relying on a try catch.

hyunhwan-jeong commented 1 year ago

Thanks for the comment, can you confirm that which version of snakemake you get the error?

michaelSkaro commented 1 year ago

Yes.

Here is the error.

2023-06-07 10:24:11,792 Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-06-07T101256.771368.snakemake.log 2023-06-07 10:24:11,793 Complete log: .snakemake/log/2023-06-07T101256.771368.snakemake.log Traceback (most recent call last): File "/home/biodocker/SalmonTE/SalmonTE.py", line 293, in run(args) File "/home/biodocker/SalmonTE/SalmonTE.py", line 244, in run run_salmon(param) File "/home/biodocker/SalmonTE/SalmonTE.py", line 157, in run_salmon with open(os.path.join(param["--outpath"], "EXPR.csv" ), "r") as inp: FileNotFoundError: [Errno 2] No such file or directory: '01_salmonTEop/sample_ID_1/EXPR.csv' thread panicked while processing panic. aborting. /cm/local/slurm/var/spool/job56620284/slurm_script: line 45: 36037 Aborted singularity exec -B /hpc:/hpc /hdd_scratch3/pv577515/images/salmonTE-0.4.sif SalmonTE.py quant --reference=hs --outpath $output_file --exprtype=count $R1 $R2

Here is the version info: snakemake 7.26.0

Build in docker, convert to singularity:

RUN mkdir /home/biodocker/bin

RUN conda config --add channels r RUN conda config --add channels conda-forge RUN conda config --add channels bioconda

RUN conda upgrade conda

VOLUME ["/data", "/config"]

RUN conda install snakemake docopt pandas r RUN conda install r-tidyverse r-scales r-writexls r-cowplot RUN conda install bioconductor-deseq2 bioconductor-tximport