wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
165 stars 46 forks source link

file not found issues -- solution #82

Open hildegarda-k opened 4 years ago

hildegarda-k commented 4 years ago

Hello, I am running the pipeline through the singularity and I get an error seemingly the script cannot deal/find the barcodes.tsv file.

here is the command: singularity exec souporcell.sif souporcell_pipeline.py -i possorted_genome_bam.bam -b barcodes.tsv -f reference/genome.fasta -t 20 -o soup_outs/ -k 2

here is the error I get:

checking modules imports done checking bam for expected tags Traceback (most recent call last): File "/opt/souporcell/souporcell_pipeline.py", line 64, in with open_function(args.barcodes) as barcodes: File "/opt/souporcell/souporcell_pipeline.py", line 57, in open_function = lambda f: gzip.open(f,"rt") if f[-3:] == ".gz" else open(f) FileNotFoundError: [Errno 2] No such file or directory: 'barcodes.tsv'

We tried the function that it crashes on outside of the script on the same file where it works normally. For simplicity, I put all in one folder but tried also going through from parent folder, also tried to gunzip the file etc but nothing seems to work. Any idea how to troubleshoot this? All the input is taken from 10x sequencing and has not been tampered with and normally functions in other pipelines.

Thanks a lot!

wheaton5 commented 4 years ago

I have no idea how this could happen. Maybe try the previous singularity image which doesn't have that optional gzip open

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=124sRtZoDlKt-jJYS6BbWGC1-CY01aAGT' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=124sRtZoDlKt-jJYS6BbWGC1-CY01aAGT" -O souporcell.sif && rm -rf /tmp/cookies.txt
yaniv-el commented 4 years ago

If I may intervene here, I had a similar issue and I believe it's related to a general singularity mounting issue (had the same problem when doing ls via singularity). You can easily test this by trying to run souporcell from a local directory (e.g. under /tmp).

My solution was to bind the data parent directory when running singularity to some path that would then be recognised:

singularity exec -B /path/to/your/data/:/dummy_path souporcell.sif souporcell_pipeline.py -i /dummy_path/possorted_genome_bam.bam -b /dummy_path/barcodes.tsv -f /dummy_path/reference/genome.fasta -t 20 -o /dummy_path/soup_outs/ -k 2

Hope it will help, Yaniv

wheaton5 commented 4 years ago

Thanks Yaniv. Singularity is supposed to mount the working directory and all downstream directories from the working directory. I guess explicitly telling it to mount your data directory works, but I'm not sure why it wasn't working with everything being in or downstream from the working directory?

yaniv-el commented 4 years ago

It did work for me at first, without explicitly binding a directory, but stopped working after our cluster had a major update. Binding a directory was a good enough workaround for me, so didn't investigate it further.

ktpolanski commented 4 years ago

I can confirm I saw something similar, and adding -B $PWD to the singularity call somehow made it go away.

scfurl commented 4 years ago

I had the exact same problem. My workaround on a SLURM cluster was the following:

first:

export SINGULARITY_BINDPATH="/drivestobemounted, /otherdrivestobeadded"

then run the singularity as something like:

sbatch --wrap='singularity exec souporcell.sif souporcell_pipeline.py \
                  -i out.sorted.bam -b outbcs.tsv.gz -f $REF \
                  -o $OUTDIR -k 2'
ktpolanski commented 4 years ago

Ran into something tangentially related when trying to shared_samples.py, and once again adding -B $PWD to the singularity seems to have fixed it. Weird, but at least there seems to be some sort of workaround.

wheaton5 commented 4 years ago

Thanks all. I have updated the title of this issue to "File not found issues -- solution". I appreciate all of the contributions as this had been an issue I didn't understand and I'm glad we have a solution now. This issue will remain open for maximum visibility.