snakemake-workflows / rna-seq-kallisto-sleuth

A Snakemake workflow for differential expression analysis of RNA-seq data with Kallisto and Sleuth.
MIT License
66 stars 44 forks source link

Libcrypto.so.1.1 not available? #85

Closed him1532 closed 9 months ago

him1532 commented 11 months ago

I am seeing an error while running kallisto sleuth pipline.

Error in group sleuth-init: jobs: rule sleuth_init: jobid: 2 output: results/sleuth/model_X.rds, results/sleuth/model_X.designmatrix.rds log: logs/sleuth/model_X.init.log (check log file(s) for error details) rule compose_sample_sheet: jobid: 9 output: results/sleuth/model_X.samples.tsv log: logs/model_X.compose-sample-sheet.log (check log file(s) for error details)

Error executing group job sleuth-init on cluster (jobid: 9159f0df-40ec-5076-ac29-7ac8600dd138, external: Your job 1509748 ("snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh") has been submitted, jobscript: /home/RNA/.snakemake/tmp.b2r32iv8/snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh). For error details see the cluster log and the log files of the involved rule(s). Removing output files of failed job compose_sample_sheet since they might be corrupted: results/sleuth/model_X.samples.tsv Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-12-18T142901.516512.snakemake.log

When I looked at the log files.

cat logs/sleuth/modelX.init.log Error: package or namespace load failed for ‘sleuth’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object >'/home/RNA/.snakemake/conda/e9ae8cf1b054539d07bb34138d5bde75/lib/R/library/rhdf5/libs/rhdf5.so': libcrypto.so.1.1: cannot open shared object file: No such file or directory Execution halted

Do I need to install some other library? Thank you.


Following is the output of snakemake/log/2023-12-18T142901.516512.snakemake.log

Workflow defines that rule get_transcriptome is eligible for caching between workflows (use the --cache argument to enable this). Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this). Workflow defines that rule get_transcript_info is eligible for caching between workflows (use the --cache argument to enable this). Workflow defines that rule convert_pfam is eligible for caching between workflows (use the --cache argument to enable this). Workflow defines that rule calculate_cpat_hexamers is eligible for caching between workflows (use the --cache argument to enable this). Workflow defines that rule calculate_cpat_logit_model is eligible for caching between workflows (use the --cache argument to enable this). Workflow defines that rule get_spia_db is eligible for caching between workflows (use the --cache argument to enable this). Building DAG of jobs... Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'. Using shell: /usr/bin/bash Provided cluster nodes: 1 Singularity containers: ignored Job stats: job count


all 1 compose_sample_sheet 2 cutadapt_pe 2 diffexp_datavzrd 1 get_transcript_info 1 get_transcriptome 1 ihw_fdr_control 3 kallisto_index 1 kallisto_quant 2 logcount_matrix 1 plot_bootstrap 1 plot_diffexp_heatmap 1 plot_diffexp_pval_hist 3 plot_fragment_length_dist 2 plot_group_density 1 plot_pca 1 render_datavzrd_config_diffexp 1 sleuth_diffexp 1 sleuth_init 2 vega_volcano_plot 1 total 29

Select jobs to execute...

[Mon Dec 18 14:29:11 2023] rule cutadapt_pe: input: /RNAs/RNAseq_test_PPM1D_231201_N12/EOL-1_GSK_1.fq.gz, /RNAs/RNAseq_test_PPM1D_231201_N12/EOL-1_GSK_2.fq.gz output: results/trimmed/EOL-2-1.1.fastq.gz, results/trimmed/EOL-2-1.2.fastq.gz, results/trimmed/EOL-2-1.qc.txt log: results/logs/cutadapt/EOL-2-1.log jobid: 8 reason: Missing output files: results/trimmed/EOL-2-1.2.fastq.gz, results/trimmed/EOL-2-1.1.fastq.gz wildcards: sample=EOL-2, unit=1 threads: 8 resources: mem_mb=16486, mem_mib=15723, disk_mb=16486, disk_mib=15723, tmpdir=

Submitted job 8 with external jobid 'Your job 1509741 ("snakejob.cutadapt_pe.8.sh") has been submitted'. [Mon Dec 18 14:33:01 2023] Finished job 8. 1 of 29 steps (3%) done Select jobs to execute...

[Mon Dec 18 14:33:01 2023] rule cutadapt_pe: input: /RNAs/RNAseq_test_PPM1D_231201_N12/EOL-1_DMSO_1.fq.gz, /RNAs/RNAseq_test_PPM1D_231201_N12/EOL-1_DMSO_2.fq.gz output: results/trimmed/EOL-1-1.1.fastq.gz, results/trimmed/EOL-1-1.2.fastq.gz, results/trimmed/EOL-1-1.qc.txt log: results/logs/cutadapt/EOL-1-1.log jobid: 4 reason: Missing output files: results/trimmed/EOL-1-1.2.fastq.gz, results/trimmed/EOL-1-1.1.fastq.gz wildcards: sample=EOL-1, unit=1 threads: 8 resources: mem_mb=15400, mem_mib=14687, disk_mb=15400, disk_mib=14687, tmpdir=

Submitted job 4 with external jobid 'Your job 1509742 ("snakejob.cutadapt_pe.4.sh") has been submitted'. [Mon Dec 18 14:39:51 2023] Finished job 4. 2 of 29 steps (7%) done Select jobs to execute...

[Mon Dec 18 14:39:51 2023] rule get_transcript_info: output: resources/transcript-info.rds log: logs/get_transcript_info.log jobid: 10 reason: Missing output files: resources/transcript-info.rds resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=

Submitted job 10 with external jobid 'Your job 1509743 ("snakejob.get_transcript_info.10.sh") has been submitted'. [Mon Dec 18 14:44:41 2023] Finished job 10. 3 of 29 steps (10%) done Select jobs to execute...

[Mon Dec 18 14:44:41 2023] rule get_transcriptome: output: resources/transcriptome.cdna.fasta log: logs/get-transcriptome/cdna.log jobid: 6 reason: Missing output files: resources/transcriptome.cdna.fasta wildcards: type=cdna resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=

Submitted job 6 with external jobid 'Your job 1509744 ("snakejob.get_transcriptome.6.sh") has been submitted'. [Mon Dec 18 14:47:42 2023] Finished job 6. 4 of 29 steps (14%) done Select jobs to execute...

[Mon Dec 18 14:47:42 2023] rule kallisto_index: input: resources/transcriptome.cdna.fasta output: results/kallisto_cdna/transcripts.cdna.idx log: results/logs/kallisto_cdna/index.cdna.log jobid: 5 reason: Missing output files: results/kallisto_cdna/transcripts.cdna.idx; Input files updated by another job: resources/transcriptome.cdna.fasta resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=

Submitted job 5 with external jobid 'Your job 1509745 ("snakejob.kallisto_index.5.sh") has been submitted'. [Mon Dec 18 14:54:22 2023] Finished job 5. 5 of 29 steps (17%) done Select jobs to execute...

[Mon Dec 18 14:54:22 2023] rule kallisto_quant: input: results/trimmed/EOL-2-1.1.fastq.gz, results/trimmed/EOL-2-1.2.fastq.gz, results/kallisto_cdna/transcripts.cdna.idx output: results/kallisto_cdna/EOL-2-1 log: results/logs/kallisto_cdna/quant/EOL-2-1.log jobid: 7 reason: Missing output files: results/kallisto_cdna/EOL-2-1; Input files updated by another job: results/kallisto_cdna/transcripts.cdna.idx, results/trimmed/EOL-2-1.2.fastq.gz, results/trimmed/EOL-2-1.1.fastq.gz wildcards: sample=EOL-2, unit=1 threads: 5 resources: mem_mb=20797, mem_mib=19834, disk_mb=20797, disk_mib=19834, tmpdir=

Submitted job 7 with external jobid 'Your job 1509746 ("snakejob.kallisto_quant.7.sh") has been submitted'. [Mon Dec 18 15:20:13 2023] Finished job 7. 6 of 29 steps (21%) done Select jobs to execute...

[Mon Dec 18 15:20:13 2023] rule kallisto_quant: input: results/trimmed/EOL-1-1.1.fastq.gz, results/trimmed/EOL-1-1.2.fastq.gz, results/kallisto_cdna/transcripts.cdna.idx output: results/kallisto_cdna/EOL-1-1 log: results/logs/kallisto_cdna/quant/EOL-1-1.log jobid: 3 reason: Missing output files: results/kallisto_cdna/EOL-1-1; Input files updated by another job: results/kallisto_cdna/transcripts.cdna.idx, results/trimmed/EOL-1-1.2.fastq.gz, results/trimmed/EOL-1-1.1.fastq.gz wildcards: sample=EOL-1, unit=1 threads: 5 resources: mem_mb=19758, mem_mib=18843, disk_mb=19758, disk_mib=18843, tmpdir=

Submitted job 3 with external jobid 'Your job 1509747 ("snakejob.kallisto_quant.3.sh") has been submitted'. [Mon Dec 18 15:37:44 2023] Finished job 3. 7 of 29 steps (24%) done Select jobs to execute... [Mon Dec 18 15:37:45 2023]

group job sleuth-init (jobs in lexicogr. order):

[Mon Dec 18 15:37:45 2023] rule compose_sample_sheet: input: config/samples.tsv, config/units.tsv, results/kallisto_cdna/EOL-1-1, results/kallisto_cdna/EOL-2-1 output: results/sleuth/model_X.samples.tsv log: logs/model_X.compose-sample-sheet.log jobid: 9 reason: Missing output files: results/sleuth/model_X.samples.tsv; Input files updated by another job: results/kallisto_cdna/EOL-1-1, results/kallisto_cdna/EOL-2-1 wildcards: model=model_X resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=

[Mon Dec 18 15:37:45 2023] rule sleuth_init: input: results/kallisto_cdna/EOL-1-1, results/kallisto_cdna/EOL-2-1, results/sleuth/model_X.samples.tsv, resources/transcript-info.rds output: results/sleuth/model_X.rds, results/sleuth/model_X.designmatrix.rds log: logs/sleuth/model_X.init.log jobid: 2 reason: Missing output files: results/sleuth/model_X.rds; Input files updated by another job: resources/transcript-info.rds, results/kallisto_cdna/EOL-1-1, results/sleuth/model_X.samples.tsv, results/kallisto_cdna/EOL-2-1 wildcards: model=model_X threads: 6 resources: mem_mb=, disk_mb=, tmpdir=

Submitted group job 9159f0df-40ec-5076-ac29-7ac8600dd138 with external jobid 'Your job 1509748 ("snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh") has been submitted'. [Mon Dec 18 15:38:04 2023] Error in group sleuth-init: jobs: rule sleuth_init: jobid: 2 output: results/sleuth/model_X.rds, results/sleuth/model_X.designmatrix.rds log: logs/sleuth/model_X.init.log (check log file(s) for error details) rule compose_sample_sheet: jobid: 9 output: results/sleuth/model_X.samples.tsv log: logs/model_X.compose-sample-sheet.log (check log file(s) for error details)

Error executing group job sleuth-init on cluster (jobid: 9159f0df-40ec-5076-ac29-7ac8600dd138, external: Your job 1509748 ("snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh") has been submitted, jobscript: /home/RNA/.snakemake/tmp.b2r32iv8/snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh). For error details see the cluster log and the log files of the involved rule(s). Removing output files of failed job compose_sample_sheet since they might be corrupted: results/sleuth/model_X.samples.tsv Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-12-18T142901.516512.snakemake.log

dlaehnemann commented 11 months ago

We have seen this ourselves, this was down to an undocumented implicit dependency in an upstream package that can mess up linking of curl. We fixed this for newer versions of the respective bioconda packages, but the changes pulling this into this workflow are still unmerged (and thus unreleased). They are in pull request #77. As a dirty fix to get past this, you could change the tag= entry in the github() function in the module import statement in your Snakefile to instead point to a branch=, using the branch behind that pull request:

github(..., branch="fix-canonical-transcript-mapped-read-extraction")`

However, this will stop working once we merge the pull request. So you will then have to switch back to the latest release. I'll try to remember to ping you here, once this is merged and released.

him1532 commented 11 months ago

ok. got it. will try and report back.

him1532 commented 10 months ago

I made changes to the Snakefiles as you suggested.

This time I have following error.

Error in group sleuth-init: jobs: rule compose_sample_sheet: jobid: 29 output: results/sleuth/model_X.samples.tsv log: logs/model_X.compose-sample-sheet.log (check log file(s) for error details) rule sleuth_init: jobid: 2 output: results/sleuth/model_X.rds, results/sleuth/model_X.designmatrix.rds log: logs/sleuth/model_X.init.log (check log file(s) for error details)

Error executing group job sleuth-init on cluster (jobid: 9159f0df-40ec-5076-ac29-7ac8600dd138, external: Your job 1509776 ("snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh") has been submitted, jobscript: /home/him/RNA/.snakemake/tmp.isl2w3r_/snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh). For error details see the cluster log and the log files of the involved rule(s). Removing output files of failed job compose_sample_sheet since they might be corrupted: results/sleuth/model_X.samples.tsv

This logs/model_X.compose-sample-sheet.log has nothing in it.

While logs/model_X.compose-sample-sheet.log out put is below.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.4 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.4.4 ✔ tibble 3.2.1 ✔ lubridate 1.9.3 ✔ tidyr 1.3.0 ✔ purrr 1.0.2 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package (http://conflicted.r-lib.org/) to force all conflicts to become errors Rows: 12 Columns: 3 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "\t" chr (3): sample, condition, path

ℹ Use spec() to retrieve the full column specification for this data. ℹ Specify the column types or set show_col_types = FALSE to quiet this message. Error in all_of(): ! Can't subset columns that don't exist. ✖ Column batch_effect doesn't exist. Backtrace: ▆

  1. ├─samples %>% drop_na(c(sample, path, all_of(variables)))
  2. ├─tidyr::drop_na(., c(sample, path, all_of(variables)))
  3. ├─tidyr:::drop_na.data.frame(., c(sample, path, all_of(variables)))
  4. │ └─tidyselect::eval_select(expr(c(!!!dots)), data, allow_rename = FALSE)
  5. │ └─tidyselect:::eval_select_impl(...)
  6. │ ├─tidyselect:::with_subscript_errors(...)
  7. │ │ └─rlang::try_fetch(...)
  8. │ │ └─base::withCallingHandlers(...)
  9. │ └─tidyselect:::vars_select_eval(...)
    1. │ └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
    2. │ └─tidyselect:::eval_c(expr, data_mask, context_mask)
    3. │ └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
    4. │ └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
    5. │ └─tidyselect:::eval_c(expr, data_mask, context_mask)
    6. │ └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
    7. │ └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
    8. │ └─tidyselect:::eval_context(expr, context_mask, call = error_call)
    9. │ ├─tidyselect:::with_chained_errors(...)
    10. │ │ └─rlang::try_fetch(...)
    11. │ │ ├─base::tryCatch(...)
    12. │ │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
    13. │ │ │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
    14. │ │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
    15. │ │ └─base::withCallingHandlers(...)
    16. │ └─rlang::eval_tidy(as_quosure(expr, env), context_mask)
    17. ├─tidyselect::all_of(variables)
    18. │ └─tidyselect:::as_indices_impl(x, vars = vars, strict = TRUE)
    19. │ └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
    20. │ └─vctrs::vec_as_location(...)
    21. └─vctrs (local) <fn>()
    22. └─vctrs:::stop_subscript_oob(...)
    23. └─vctrs:::stop_subscript(...)
    24. └─rlang::abort(...) Execution halted

Since there is a message with batch_effect doesn't exist following is the output of sample.tsv

sample condition batch_effect EOL-1 cond1 batch1 EOL-2 cond2 batch1 HL601 cond1 batch1 HL602 cond2 batch1 KG-1a1 cond1 batch1 KG-1a2 cond2 batch1 MOLT41 cond1 batch1 MOLT42 cond2 batch1 OCI-AML31 cond1 batch1 OCI-AML32 cond2 batch1 THP-11 cond1 batch1 THP-12 cond2 batch1

Is there something wrong?

dlaehnemann commented 10 months ago

There's nothing obviously sticking out for me.

But are you sure that your sample sheet is called sample.tsv and not samples.tsv (with an s). So maybe it is not properly loading your sample sheet.

Also, a batch_effect with just one factor level (batch1) for all samples doesn't make any sense, as the model cannot do any correction with that. If you are sure you don't have any batches, so basically if you know that all samples were handled in one big batch, all at the same time, then simply leave out the specification of batch effects. But this is rarely the case...

Also, the column names condition and batch_effect are arbitrary names that you can change, you then just need to update your model specification in the config/config.yaml file accordingly.

And finally, just look at input and output files of the rule that is failing and those that come before it. Maybe you can spot what and where it goes wrong.

dlaehnemann commented 9 months ago

I hope this is resolved for you, so I am closing it for now. If you run into further problems, please feel free to open new issues for those.