Hi, I have been trying to run zUMIs on a HPC (both through a sbatch job and on an interactive node), but did not succeed. It seems like it does not see my input files as the Smartseq3.zUMIs_YAMLerror.log says:

WARNING: ignoring environment value of R_HOME $file1 NULL

$file2 NULL

$file3 NULL

$file4 NULL

[1] "" [1] "" [1] "" [1] "" [1] "" "" "" "" [1] "" [1] "" [1] "" [1] "" [1] "" "" "" "" [1] "NULL" "NULL" "NULL" "NULL" $file1 NULL

$file2 NULL

$file3 NULL

$file4 NULL

$file1 NULL

$file2 NULL

$file3 NULL

$file4 NULL

[1] 0

This is my YAML file:

project: Smartseq3 sequence_files: file1: name: /home/ubaruchel/smart-seq3/data/240814/exp1/1a_cutadapt/Undetermined_S0_L001_trim_R1.fastq.gz base_definition:

cDNA(24-75)
UMI(12-20) find_pattern: ATTGCGCAATG file2: name: /home/ubaruchel/smart-seq3/data/240814/exp1/1a_cutadapt/Undetermined_S0_L001_trim_R2.fastq.gz base_definition:
cDNA(1-75) file3: name: /home/ubaruchel/smart-seq3/data/240814/exp1/1c_filter_index_reads/filtered_I1.fastq.gz base_definition:
BC(1-10) file4: name: /home/ubaruchel/smart-seq3/data/240814/exp1/1c_filter_index_reads/filtered_I2.fastq.gz base_definition:
BC(1-10) reference: STAR_index: /data/scratch/DBC/UBCN/CANCDYN/genomes/homo-sapiens/hg38-ercc/star GTF_file: /data/scratch/DBC/UBCN/CANCDYN/genomes/homo-sapiens/hg38-ercc/gtf/combined_hg38_ercc.gtf out_dir: /home/ubaruchel/smart-seq3/data/240814/exp1/2b_zUMIs num_threads: 24 mem_limit: 50 filter_cutoffs: BC_filter: num_bases: 3 phred: 20 UMI_filter: num_bases: 2 phred: 20 barcodes: barcode_num: ~ barcode_file: /home/ubaruchel/smart-seq3/data/240814/exp1/0c_prep_well_barcodes/expected_well_barcodes.txt automatic: no BarcodeBinning: 1 nReadsperCell: 100 demultiplex: no counting_opts: introns: yes downsampling: '0' strand: 0 Ham_Dist: 1 write_ham: no velocyto: no primaryHit: yes twoPass: no make_stats: yes which_Stage: Filtering zUMIs_directory: /data/scratch/DBC/UBCN/CANCDYN/software/zUMIs

samtools_exec: samtools pigz_exec: pigz STAR_exec: STAR Rscript_exec: Rscript

I ran this command through a .sh file that is called through a sbatch script (SLURM):

!/bin/bash

Always add these two commands to your scripts when using a environment

eval "$(conda shell.bash hook)" source $CONDA_PREFIX/etc/profile.d/mamba.sh

Source the parameters file

source ./params_bioinfo_experiments/0_params.sh

Set variables

input_dir=$input_dir_2b output_dir=$output_dir_2b log_dir=$log_dir_2b

Create the output and log directories if they don't exist

mkdir -p "$output_dir" mkdir -p "$log_dir"

Run zUMIs using its own miniconda environment (-c)

and the prepared YAML file (input_dir)

$path_zUMIs/zUMIs.sh -c -y $input_dir

I do not know what the problem is. My hypothesis is that maybe the micoconda environment makes it not see in the input files (that do exist and are not empty as verified by the du -sh command). But at the same time it seems to be able to detect a slight discrepancy in the STAR versions used for my index (which means it does not see it as NULL) vs the one used by zUMIs.

Can you help me, please?

I have also tried to make my own mamba (conda) environment to run zUMIs following the vignette https://github.com/sdparekh/zUMIs/wiki/Installation#dependencies but I have not been able to complete the last part of the dependencies installation: devtools::install_github('VPetukhov/ggrastr') (some issues with Cairo)... And Docker is not accepted by HPCs (for security reasons)... Is there anyway you could make it into a Singularity file, please? This would make it much easier to deploy and in particular into pipelines (Nextflow / Snakemake)...

Thank you very much,

Best wishes,

Ulysse

sdparekh / zUMIs

BUG Input files perceived as NULL while they exist (checked multiple times) #405

!/bin/bash

Always add these two commands to your scripts when using a environment

Source the parameters file

Set variables

Create the output and log directories if they don't exist

Run zUMIs using its own miniconda environment (-c)

and the prepared YAML file (input_dir)

$path_zUMIs/zUMIs.sh -c -y $input_dir

$path_zUMIs/zUMIs.sh -c -y $input_dir