t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
38 stars 23 forks source link

Slamdunk all command not working #96

Closed aishwarya-gondane closed 3 years ago

aishwarya-gondane commented 3 years ago

Hi,

I have installed Slamdunk using conda, as per the instructions. Although I am unable to run the slamdunk all command. Please find the commands and the traceback below

conda activate myslamdunk (myslamdunk) ubuntu@virtual-ubuntu-3xlarge:~$ slamdunk all -r /media/volume/GRCh38.primary_assembly.genome.fa -b /media/volume/GRCh38_UCSC_3UTR.bed -o media/volume/STAR_output_DMSO_R1_test1 files /media/volume/7_DMSO_1_trimmed_R1.fastq

Error: Running slamDunk map for 2 files (1 threads) Traceback (most recent call last): File "/home/ubuntu/.local/bin/slamdunk", line 11, in sys.exit(run()) File "/home/ubuntu/.local/lib/python3.6/site-packages/slamdunk/slamdunk.py", line 520, in run runAll(args) File "/home/ubuntu/.local/lib/python3.6/site-packages/slamdunk/slamdunk.py", line 245, in runAll runMap(tid, bam, referenceFile, n, args.trim5, args.maxPolyA, args.quantseq, args.endtoend, args.topn, sampleInfo, dunkPath, args.skipSAM) File "/home/ubuntu/.local/lib/python3.6/site-packages/slamdunk/slamdunk.py", line 149, in runMap mapper.Map(inputBAM, referenceFile, outputSAM, getLogFile(outputLOG), quantseqMapping, endtoendMapping, threads=threads, trim5p=trim5p, maxPolyA=maxPolyA, topn=topn, sampleId=tid, sampleName=sampleName, sampleType=sampleType, sampleTime=sampleTime, printOnly=printOnly, verbose=verbose) File "/home/ubuntu/.local/lib/python3.6/site-packages/slamdunk/dunks/mapper.py", line 101, in Map if(checkStep([inputReference, inputBAM], [replaceExtension(outputSAM, ".bam")], force)): File "/home/ubuntu/.local/lib/python3.6/site-packages/slamdunk/utils/misc.py", line 149, in checkStep raise RuntimeError("One or more input files don't exist: " + str(inFiles)) RuntimeError: One or more input files don't exist: ['/media/volume/GRCh38.primary_assembly.genome.fa', 'files']

t-neumann commented 3 years ago

I think what's causing is this files parameter before /media/volume/7_DMSO_1_trimmed_R1.fastq

tanasa commented 3 years ago

Dear all, i am getting a similar error, shall i use the singularity container on a SLURM cluster :

singularity exec ./SLAMDUNK_SINGULARITY/slamdunk_latest.sif slamdunk map \
> -r GRCm38.primary_assembly.genome.fa \
> -o ./sample10_1_map \
> -t 8 \
> 10_R1_001.fastq.gz
INFO:    Converting SIF file to temporary sandbox...
Creating output directory: ./sample10_1_map
Running slamDunk map for 1 files (8 threads)
Traceback (most recent call last):
  File "/opt/conda/envs/slamdunk/bin/slamdunk", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 436, in run
    runMap(tid, bam, referenceFile, n, args.trim5, args.maxPolyA, args.quantseq, args.endtoend, args.topn, sampleInfo, outputDirectory, args.skipSAM)
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 148, in runMap
    mapper.Map(inputBAM, referenceFile, outputSAM, getLogFile(outputLOG), quantseqMapping, endtoendMapping, threads=threads, trim5p=trim5p, maxPolyA=maxPolyA, topn=topn, sampleId=tid, sampleName=sampleName, sampleType=sampleType, sampleTime=sampleTime, printOnly=printOnly, verbose=verbose)
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/dunks/mapper.py", line 101, in Map
    if(checkStep([inputReference, inputBAM], [replaceExtension(outputSAM, ".bam")], force)):
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/utils/misc.py", line 149, in checkStep
    raise RuntimeError("One or more input files don't exist: " + str(inFiles))
RuntimeError: One or more input files don't exist: ['GRCm38.primary_assembly.genome.fa', '10_R1_001.fastq.gz']
INFO:    Cleaning up image..._

although these files are in the current directory :

<> GRCm38.primary_assembly.genome.fa <> 10_R1_001.fastq.gz

any help would be very appreciated ! thanks a lot !

it has worked approx 6 month ago. thank you again.

tanasa commented 3 years ago

ahaaa, i think that i know that is happening, i have to place all the files (fasta, fastq.gz) in my home folder on SLURM :

/home/btanasa

however, my account is small, it has only 32GB; and the laboratory account is much bigger (7TB); however, when i run :

singularity exec SLAMDUNK_SINGULARITY/slamdunk_latest.sif slamdunk map -r /home/btanasa/GRCm38.primary_assembly.genome.fa -o ./Sample_TEST -t 8 /labs/gold/M_data_SLAMseq/the_SAMPLES_MAY2021/10_R1_001.fastq.gz i am getting the same message, as reported above. Is there any way to fix it please, any advice ?

beside having to place all the files in a limited home folder. thanks a lot !

t-neumann commented 3 years ago

Hi @tanasa

this is more of a singularity "issue" - our IT mounts several folders per default in every image at startup, so I guess it would me more helpful to talk to them what is done on their end.

tanasa commented 3 years ago

Dear Tobias, thank you very much for your prompt reply. If I may ask and add please :

<> is there a way to set up (externally) the singularity in such a way that it deposits the results in a specific folder (yesterday, i have reused the docker image at https://hub.docker.com/r/nfcore/slamseq, and converted it into a singularity image, and i am still encountering the same problem as described);

<> could i use a set of **symbolic links (ln -s path/of/dir path/to/dir) and "tell" singularity image in which folder to output the data ?

<> would the problem be solved shall i use another SLURM cluster ?

<> or the singularity nexflow pipeline (https://nf-co.re/slamseq) ?

<> or the problem could be solved only after talking with the IT people ?

many thanks again for all your help,

-- bogdan

t-neumann commented 3 years ago

Usually the way it goes is you mount your file system in the container, and then can use it as if on the cluster itself.

https://sylabs.io/guides/3.0/user-guide/bind_paths_and_mounts.html

So no need for any links or workarounds once that is done - you can plainly use the same paths as if "outside" of a container.

tanasa commented 3 years ago

Dear Tobias, thank you for your message and advice.

in the continuation of my email below, please help me about running a singularity container on the SLURM cluster :

<> i have mounted in the singularity container the folder where i keep all the input files :

singularity exec --bind /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/:/mnt SLAMDUNK_SINGULARITY/slamdunk_latest.sif ls /mnt

(according to : https://sylabs.io/guides/3.0/user-guide/bind_paths_and_mounts.html) i

<> shall i run : i canot see the results in /tmp/"${SAMPLE}.results" . i can not find those in /tmp/.

singularity exec /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/SLAMDUNK_SINGULARITY/slamdunk_latest.sif slamdunk all \
-r /home/btanasa/GRCm38.primary_assembly.genome.fa \
-b /home/btanasa/3UTRs_vM14_github_repository.27aug2020.sortdesc.LONG.with.SYMBOLS.to.use.bed \
-o  /tmp/"${SAMPLE}.results" \
-t 8 \
./10_R1_001.fastq.gz

<> shall i run (below), the results are still deposited in my home folder (that i do not want to, as it has limited storage)

singularity exec /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/SLAMDUNK_SINGULARITY/slamdunk_latest.sif slamdunk all \
-r ./GRCm38.primary_assembly.genome.fa \
-b ./3UTRs_vM14_github_repository.27aug2020.sortdesc.LONG.with.SYMBOLS.to.use.bed \
-o ./"${SAMPLE}.results" \
-t 8 \
./10_R1_001.fastq.gz

many many thanks,

bogdan

t-neumann commented 3 years ago

Hi Bogdan,

do you have any other folder on your cluster than your home, where you have sufficient space?

I fear /tmp will not work as it interferes with the container file system that probably also has its own /tmp mounted.

tanasa commented 3 years ago

Dear Tobias, yes, the folder with a lot of storage space is shown below and it has ~30 TB.:

/labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/;

i remember that, when i do run (below), i do get a parsing error (" /labs ..".). i will try again and will let you know. thanks again !

singularity exec /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/SLAMDUNK_SINGULARITY/slamdunk_latest.sif slamdunk all \
-r ./GRCm38.primary_assembly.genome.fa \
-b ./3UTRs_vM14_github_repository.27aug2020.sortdesc.LONG.with.SYMBOLS.to.use.bed \
-o /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/"${SAMPLE}.results" \
-t 8 \
/labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/10_R1_001.fastq.gz
tanasa commented 3 years ago

Dear Tobias, yes, shall i use the command above, the stderr message that i am getting is shown , although it is a bit strange, as i am able to write at any time in "/labs/jlgoldbe/ ..."

slamdunk all
Creating output directory: /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/S10.all.out.again1.results
Traceback (most recent call last):
  File "/opt/conda/envs/slamdunk/bin/slamdunk", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 478, in run
    runAll(args)
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 213, in runAll
    createDir(outputDirectory)
  File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 80, in createDir
    os.makedirs(directory)
  File "/opt/conda/envs/slamdunk/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/opt/conda/envs/slamdunk/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/opt/conda/envs/slamdunk/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 1 more time]
  File "/opt/conda/envs/slamdunk/lib/python3.7/os.py", line 221, in makedirs
    mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/labs'
INFO:    Cleaning up image.
tanasa commented 3 years ago

Dear Tobias,

we have found another way to bind the folders to the singularity container by using a /local/scratch/btanasa folder :

singularity exec --bind /local/scratch/btanasa:/output10 \
/labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/SLAMDUNK_SINGULARITY/slamdunk_latest.sif slamdunk all \
-r /home/btanasa/GRCm38.primary_assembly.genome.fa \
-b /home/btanasa/3UTRs_vM14_github_repository.27aug2020.sortdesc.LONG.with.SYMBOLS.to.use.bed \
-o /output10 \
-t 8 \
./10_R1_001.fastq.gz

however, i do not see the output10 in the /local/scratch/btanasa. Is there something that i am missing ? Would you please help ?

many thanks :)

t-neumann commented 3 years ago

Can u run it within the container with singularity shell? And check then if within the container the output was produced?

tanasa commented 3 years ago

Dear Tobias, thank you, i have done accordingly, and i can not find the folder within the singularity container.

singularity shell  ./SLAMDUNK_SINGULARITY/slamdunk_latest.sif
Singularity> ls -1
Singularity> ls -1 /tmp/
Singularity> ls -1 /home/
Singularity> ls -1 /home/btanasa

i could search also in the /tmp or /scratch folder of the SLURM node where the job was executed, although i am not very sure how to access that SLURM node. thanks again :)

t-neumann commented 3 years ago

Hm then it seems that mounting the external file system in the container did not work properly - how did you mount your home in them container, since you seem to be able to access this?

tanasa commented 3 years ago

Dear Tobias, many thanks again for your time, and suggestions. Shall i run the singularity on a specific cluster node :

singularity exec --bind /local/scratch/btanasa:/output10 \
/labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/SLAMDUNK_SINGULARITY/slamdunk_latest.sif slamdunk all \
-r /home/btanasa/GRCm38.primary_assembly.genome.fa \
-b /home/btanasa/3UTRs_vM14_github_repository.27aug2020.sortdesc.LONG.with.SYMBOLS.to.use.bed \
-o /output10 \
-t 8 \
./10_R1_001.fastq.gz

then i am able to access the scratch folder on that specific cluster NODE, and retrieve the results :

ls -1 /local/scratch/btanasa

t-neumann commented 3 years ago

And if you run it now to write the output to that folder - is there anything produced?

tanasa commented 3 years ago

Dear Tobias, yes, the output is produced in the /local/scratch/btanasa of a specific SLURM node.

However it worked only for a fastq.gz file (ie 10_R1_001.fastq.gz), and i guess that it worked because that file was inside the container (please below).

Would you please remind me, how can i copy other fastq.gz files inside the slamdunk container ? thanks a lot !

singularity shell SLAMDUNK_SINGULARITY/slamdunk_latest.sif 
INFO:    Converting SIF file to temporary sandbox...
Singularity> ls -1 
10_R1_001.fastq.gz
3UTRs_vM14_github_repository.27aug2020.bed
3UTRs_vM14_github_repository.27aug2020.sortdesc.LONG.with.SYMBOLS.to.use.bed
3UTRs_vM14_github_repository.28aug2020.bed
3UTRs_vM14_github_repository.28aug2020.with.SYMBOLS.to.use.bed
GRCm38.primary_assembly.genome.fa
GRCm38.primary_assembly.genome.fa-enc.2.ngm
GRCm38.primary_assembly.genome.fa-ht-13-2.3.ngm
GRCm38.primary_assembly.genome.fa.fai
GSE99970_GSE99970_mESC_UTR_regions.downloaded.GSE99970.03sep2020.bed
GSE99970_GSE99970_mESC_counting_windows.downloaded.GSE99970.03sep2020.bed
mm10.genes.intervals.use.intermed.SLAMseq.full.length.bed
tanasa commented 3 years ago

Hi Tobias, to answer your question, yes, yes, the output is produced in ` the /local/scratch/btanasa`` of a specific SLURM node.

for example :


[btanasa@dper7425-srcf-d15-25 btanasa]$ ls -1
count
filter
map
slrmtmp.22510809
snp

If i may re-phrase please my text above please : i have noted that the slamdunk singularity "sees" the $HOMEfolder on the SLUMR i.e. /home/btanasa.A simple question please : how can i change this $HOME folder that the singularity "sees" with other location on the cluster e.g./labs/zzz/data ? thanks a million !

t-neumann commented 3 years ago

Ah I see - and you don't specify any volume mounting command upon starting the container to mount the home there - so it just shows up per default?

tanasa commented 3 years ago

yes, Tobias. $HOME (ie. /users/btanasa) shows up by default. Please, is there a way to mount another directory as input folder ?

thanks a lot !

t-neumann commented 3 years ago

Can you not in the same fashion you mounted /local/scratch/btanasa also mount a different folder with input data?

tanasa commented 3 years ago

oh, well, i have been trying to do :

singularity exec \
/labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/SLAMDUNK_SINGULARITY/slamdunk_latest.sif slamdunk all \
--bind /local/scratch/btanasa:/output8 \
--home /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021:/home \
-r /home/btanasa/GRCm38.primary_assembly.genome.fa \
-b /home/btanasa/3UTRs_vM14_github_repository.27aug2020.sortdesc.LONG.with.SYMBOLS.to.use.bed \
-o /output8 \
-t 4 \
/labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/8_R1_001.fastq.gz

and i am getting the message :

_slamdunk: error: unrecognized arguments: --bind --home /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021:/home /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/8_R1_001.fastq.gz
INFO:    Cleaning up image.._
tanasa commented 3 years ago

Hi Tobias, finally i have got it to run by using an interactive session :

singularity exec \
--bind /local/scratch/btanasa:/output8 \
--home /labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021:/home \
/labs/jlgoldbe/MASSY_data_SLAMseq/the_SAMPLES_MAY2021/SLAMDUNK_SINGULARITY/slamdunk_latest.sif slamdunk all \
-r GRCm38.primary_assembly.genome.fa \
-b 3UTRs_vM14_github_repository.27aug2020.sortdesc.LONG.with.SYMBOLS.to.use.bed \
-o /output8 \
-t 4 \
./8_R1_001.fastq.gz
t-neumann commented 3 years ago

Yeah I think in your previous example you mixed up the argument order - you already execute the slamdunk command before adding the bind parameters which are of course not recognized by slamdunk anymore and the whole thing fails.