t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
39 stars 23 forks source link

Samplesheet (see Sample file format ) or a list of all sample BAM/FASTA(gz)/FASTQ(gz) files (wildcard * accepted). #134

Open Umair1441 opened 1 year ago

Umair1441 commented 1 year ago

Hy I have 20 GB of data that is store in subdirectories like A1 folder then A1 has two subfolders A1-1 and A1-2 then so on...I want to add all the files as input and I cand understand how ti use sample sheet for that.

thanks

t-neumann commented 1 year ago

Does wildcard not work? like */*/*fq.gz?

Umair1441 commented 1 year ago

slamdunk all -r hg19.fa -b Hg.bed -o output -rl 100 -ss data/*.fq.gz I write this ...

Umair1441 commented 1 year ago

Hy . slamdunk all -r hg19.fa -b Hg.bed -o output -rl 100 -ss data/ *.fq.gz This command runs for me. I have 16 files that is 20 GB of data. The slamdunk command is running from the last 24 hours on my server it creates one bam file in 24 hours and then stuck.. Kindly guide me about that.

t-neumann commented 1 year ago

Is the process itself also stuck or still running? What does top say, does it still use CPU?

Umair1441 commented 1 year ago

Now I again run the command and top => %CPU -> 1466.

I use 16 threads, and I have 16 .fq files which is 20 GB. Could you guide me how much time it takes to run on all 16 files?

t-neumann commented 1 year ago

Hi - do you have 20GB per file or in total? It shouldnt really run much longer than 1 hour per file, so for sure be done within 24 hours

Umair1441 commented 1 year ago

20 GB total 16 files..

Thank you.

Umair1441 commented 1 year ago

Hellow. I have 16 fastq files with 64GB of size and I run slamdunk all this on the server with 16 threads. it is running from the last 13 days and just mapped 14 files till now. please tell me why it takes so much time for me.

t-neumann commented 1 year ago

Hi - that indeed sounds unreasonably slow. What command did you use, what's your memory size and did you make sure that NextGenMap is running with 16 cores (e.g. with top)?

Worst case I can run it myself if you are willing to supply the dataset to me, to check what's going on

Umair1441 commented 1 year ago

WhatsApp Image 2023-09-05 at 11 30 25 AM Hi, I use the following command.

slamdunk all -r hg19.fa -b Hg.bed -o output -t 16 -rl 100 -ss data/*.fq.gz

The server has total 49 threads and 16 are running while I check from the top -H -p .

The server has a total of 191891 memory.

t-neumann commented 1 year ago

OH sorry now I think I see what's going on. it seems to be running with only 1 core per process. What happens if you do -t 256 and then again check with top, how much %CPU is utilized?

Umair1441 commented 1 year ago

So Can you guide me please how can now increase the threads in the running process?

t-neumann commented 1 year ago

Yes try slamdunk all -r hg19.fa -b Hg.bed -o output -t 256 -rl 100 -ss data/*.fq.gz

`

Umair1441 commented 1 year ago

Yes I applied the same command on the last file but it is still slow .. Any other suggestion Can I increase the number of threads to 1000 or higher?

t-neumann commented 1 year ago

What does the CPU utilization in top say? You can increase the number of threads, just at some point the communication overhead outweighs the gain in speed