sinanugur / cellsnake

Cellsnake tool main repo
https://cellsnake.readthedocs.io/
MIT License
33 stars 7 forks source link

DE step failing at `scrna-find-pairwise-markers.R` #7

Open manogenome opened 1 month ago

manogenome commented 1 month ago

Hi, Thank you so much for the workflow. I'm recently facing the below issue at the DE step. Could you kindly have a look at the issue when possible?

Analyzing fetal-brain test data

download data

wget https://zenodo.org/record/7919631/files/fetal-brain-data.zip?download=1
mv fetal-brain-data.zip?download=1 fetal-brain-data.zip
unzip fetal-brain-data.zip
ls fetal-brain/data/

S10X_17_028 S10X_17_029

activate cellsnake env

conda activate cellsnake
cellsnake --version
0.2.0.12
cellsnake --generate-template

update metadata

cat metadata.csv

sample,condition S10X_17_028,A S10X_17_029,B

successful

cellsnake advanced fetal-brain/data -j 85 --configfile config.yaml 

cellsnake integrate fetal-brain/data -j 85

cellsnake integrated advanced analyses_integrated/seurat/integrated.rds \
    --resolution auto -j 85

failing

cellsnake integrated advanced analyses_integrated/seurat/integrated.rds \
    --resolution auto -j 85 --metadata metadata.csv
>more 2024-07-24T163222.373032.snakemake.log
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 85
Rules claiming more threads will be scaled down.
Job stats:
job                                  count
---------------------------------  -------
all                                      1
create_deseq_metadata_tables             1
deseq_analysis_from_metadata_file        1
volcano_plots                            1
total                                    4

Select jobs to execute...

[Wed Jul 24 16:32:23 2024]
rule deseq_analysis_from_metadata_file:
    input: analyses_integrated/processed/percent_mt~auto/resolution~auto/integrated.rds, metadata.csv
    output: analyses_integrated/markers/percent_mt~auto/resolution~auto/deseq_integrated-condition.rds
    jobid: 33
    reason: Missing output files: analyses_integrated/markers/percent_mt~auto/resolution~auto/deseq_integrated-condition.rds
    wildcards: sample=integrated, percent_mt=auto, resolution=auto, i=condition
    resources: tmpdir=/tmp

[Wed Jul 24 16:32:33 2024]
Error in rule deseq_analysis_from_metadata_file:
    jobid: 33
    input: analyses_integrated/processed/percent_mt~auto/resolution~auto/integrated.rds, metadata.csv
    output: analyses_integrated/markers/percent_mt~auto/resolution~auto/deseq_integrated-condition.rds
    shell:
        /opt/conda/envs/cellsnake/lib/python3.9/site-packages/cellsnake/scrna/workflow/scripts/scrna-find-pairwise-markers.R --rds analyses_integrated/processed/percent_mt~auto/resolution~auto/integrated.rds --logfc.threshold 0.25 --test
.use wilcox --output.rds analyses_integrated/markers/percent_mt~auto/resolution~auto/deseq_integrated-condition.rds --metadata metadata.csv --metadata.column condition
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
manogenome commented 1 month ago

Also, another error message at this step for a different dataset:

Error message from the terminal:

[Thu Jul 25 15:13:38 2024]
Finished job 41.
39 of 43 steps (91%) done
Exiting because a job execution failed. Look above for error message
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-07-25T092757.755271.snakemake.log
Traceback (most recent call last):
  File "/opt/conda/envs/cellsnake/bin/cellsnake", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/cellsnake/lib/python3.9/site-packages/cellsnake/command_line.py", line 381, in main
    run_workflow(cli_arguments)
  File "/opt/conda/envs/cellsnake/lib/python3.9/site-packages/cellsnake/command_line.py", line 351, in run_workflow
    subprocess.check_call(str(snakemake_argument),shell=True)
  File "/opt/conda/envs/cellsnake/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake --retries 5 --rerun-incomplete -k  -j 85  -s /opt/conda/envs/cellsnake/lib/python3.9/site-packages/cellsnake/scrna/workflow/Snakefile  --configfile=config.yaml --config datafolder=analyses_integrated/seurat/integrated.rds cellsnake_path=/opt/conda/envs/cellsnake/lib/python3.9/site-packages/cellsnake/scrna/ min_molecules=0 singler_ref=BlueprintEncodeData dims=30 metadata=metadata.csv umap_markers_plot=True microbiome_min_cells=1 species=human metadata_column=condition min_features=200 max_molecules=Inf test_use=wilcox min_percentage_to_plot=2 variable_selection_method=vst percent_mt=auto microbiome_min_features=3 percent_rp=0 doublet_filter=T taxa=genus reduction=cca scale_factor=10000 tsne_markers_plot=False celltypist_model=Immune_All_Low.pkl logfc_threshold=0.25 organism=hsa show_labels=T confidence=0.05 min_hit_groups=4 max_features=Inf min_cells=3 normalization_method=LogNormalize resolution=auto highly_variable_features=2000 mapping=org.Hs.eg.db marker_plots_per_cluster_n=20 runid=dbe85684 is_integrated_sample=True option=advanced' returned non-zero exit status 1.

Last few lines from the log file:

[Thu Jul 25 13:07:26 2024]
rule kegg_enrichment:
    input: results_integrated/integrated/percent_mt~auto/resolution~auto/table_all-markers-seurat_clusters.xlsx
    output: analyses_integrated/kegg/percent_mt~auto/resolution~auto/integrated-seurat_clusters-kegg.rds, results_integrated/integrated/percent_mt~auto/resolution~auto/enrichment_analysis/table_KEGG-enrichment-seurat_clusters.xlsx, results_integrated/integrated/percent_mt~auto/resolution~auto/enrichment_analysis/table_KEGG-geneset_enrichment-seurat_clusters.xlsx, results_integrated/integrated/percent_mt~auto/resolution~auto/enrichment_analysis/table_KEGG-module_enrichment-seurat_clusters.xlsx, results_integrated/integrated/percent_mt~auto/resolution~auto/enrichment_analysis/table_KEGG-module_geneset_enrichment-seurat_clusters.xlsx
    jobid: 18
    reason: Missing output files: results_integrated/integrated/percent_mt~auto/resolution~auto/enrichment_analysis/table_KEGG-enrichment-seurat_clusters.xlsx, results_integrated/integrated/percent_mt~auto/resolution~auto/enrichment_analysis/table_KEGG-module_geneset_enrichment-seurat_clusters.xlsx, results_integrated/integrated/percent_mt~auto/resolution~auto/enrichment_analysis/table_KEGG-module_enrichment-seurat_clusters.xlsx, results_integrated/integrated/percent_mt~auto/resolution~auto/enrichment_analysis/table_KEGG-geneset_enrichment-seurat_clusters.xlsx; Input files updated by another job: results_integrated/integrated/percent_mt~auto/resolution~auto/table_all-markers-seurat_clusters.xlsx
    wildcards: sample=integrated, percent_mt=auto, resolution=auto, i=seurat_clusters
    resources: tmpdir=/tmp

[Thu Jul 25 13:07:33 2024]
Finished job 1.
35 of 43 steps (81%) done
[Thu Jul 25 13:08:11 2024]
Finished job 9.
36 of 43 steps (84%) done
[Thu Jul 25 13:09:32 2024]
Finished job 18.
37 of 43 steps (86%) done
[Thu Jul 25 13:24:52 2024]
Finished job 16.
38 of 43 steps (88%) done
[Thu Jul 25 15:13:38 2024]
Finished job 41.
39 of 43 steps (91%) done
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-07-25T092757.755271.snakemake.log
manogenome commented 1 month ago

Docker Cellsnake Test Data

I've found a workaround to the current issue. I did the usual steps (1-3), without supplying the metadata to cellsnake version 0.2.0.12 and then ran the previous version 0.2.0.11 as step 4 for plotting condition-specific DE markers.

Step 1

docker run -m 120g --cpus 20 \
--rm -v "$(pwd):/app" sinanugur/cellsnake:0.2.0.12 cellsnake advanced  fetal-brain/data \
--configfile config.yaml --jobs 20

[Thu Aug 8 09:18:17 2024] Finished job 0. 53 of 53 steps (100%) done Complete log: .snakemake/log/2024-08-08T085227.509111.snakemake.log Samples detected : S10X_17_029 S10X_17_028

Step 2

docker run -m 120g --cpus 20 \
--rm -v "$(pwd):/app" sinanugur/cellsnake:0.2.0.12 cellsnake integrate fetal-brain/data --jobs 20

[Thu Aug 8 09:20:17 2024] Finished job 0. 2 of 2 steps (100%) done Complete log: .snakemake/log/2024-08-08T091909.400967.snakemake.log Samples detected : S10X_17_029 S10X_17_028

Step 3

docker run -m 120g --cpus 20 \
--rm -v "$(pwd):/app" sinanugur/cellsnake:0.2.0.12 cellsnake integrated \
advanced analyses_integrated/seurat/integrated.rds --resolution 0.8 --jobs 20

[Thu Aug 8 10:05:29 2024] Finished job 0. 40 of 40 steps (100%) done Complete log: .snakemake/log/2024-08-08T092038.572432.snakemake.log Samples detected : integrated

Step 4

docker run -m 120g --cpus 20 \
--rm -v "$(pwd):/app" sinanugur/cellsnake:0.2.0.11 cellsnake integrated \
standard analyses_integrated/seurat/integrated.rds --resolution 0.8 \
--metadata metadata.csv --jobs 20

[Thu Aug 8 10:06:35 2024] Finished job 0. 6 of 6 steps (100%) done Complete log: .snakemake/log/2024-08-08T100618.444608.snakemake.log {'integrated'}