Closed ahorn720 closed 6 years ago
Did you get a success message when running install_dependencies.sh
?
Did you have enough space on your system?
$ df -h /YOUR/WORKING/DIRECTORY
$ free -h
$ df -h /tmp
$ df -h $TMPDIR
$ df -h $TMP
I did get a success message as you'll see in the attachment. I also included the additional results of the commands you've requested. Based on a previous suggestion you made to me i also changed the sge.pe=smp in my ~/.bds/bds.config .
dfTMP.txt dfpwd.txt dftmp.txt dftmpdir.txt free.txt stderrorNBM_55-4.txt stdoutNBM_55-4.txt
It's Segmentation fault in sambamba sort
and it usually occurs when there are not enough resource (memory, disk space on temps /tmp
, $TMPDIR
). Set your $TMPDIR
in your ~/.bashrc
and try again.
export TMPDIR=/somewhere/fast/and/large/storage/
If you are working on SCG, it is actually migrating to a new one with SLURM (not SGE). And our pipeline currently does not support SLURM on SCG. Let me know which SCG server you are working on.
You may need to submit a shell script for BDS command to SLURM manually with bds ... -system local
.
So i added export TMPDIR="/local/scratch"' to my
~/.bashrc`.
/local/scratch
[ahorning@smsx10srw-srcf-d15-37 ~]$ df -h $TMPDIR
Filesystem Size Used Avail Use% Mounted on
datapool/local/scratch 3.6T 2.5G 3.6T 1% /local/scratch
I reran the pipeline and still got a segmentation error. stderrorNBM_55-4.txt stdoutNBM_55-4.txt
How do i submit a shell script for the BDS command to SLURM manualy with bds ... -system local
.
Do you mean adding it to this script somewhere?
~/.bds/bds ~/ATACseq/scripts/atac_dnase_pipelines/atac.bds \
-species hg38 \
-enable_idr \
-auto_detect_adapter \
-out_dir "${sample}_Peaks_pipeline_out" \
-title "${sample}" \
-fastq1_1 "${files[0]}" \
-fastq1_2 "${files[1]}"
See this. https://github.com/biod/sambamba/issues/215
check your stack size with ulimit -a
and ask admin to increase it.
My ulimit -a is:
[ahorning@smsx10srw-srcf-d15-37 scripts]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 386865
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 131072
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 131072
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I'll ask for an increase if i can.
Are you working on the new SCG with SLURM?
yes.
Pipeline currently does not support the new SCG server with SLURM. Can you try again on the old one until I make a fix for the new one?
Running now on SCG4, will get back to you with results (especially if it doesn't work again).
Update: It didn't work but it seems to be because macs2 didn't have enough memory. I gave it -mem_macs2 30G
and I'll let you know how it goes.
On the https://susciclu.slack.com, I was speaking with John Hank (SCG administrator extraordinaire) and he (griznog) mentioned the following. griznog [< 1 minute ago] ok, if they hit any snags let me know. SCG4 will shrink even more in the next weeks or month or so, so the sooner things are migrated/updated the better. griznog [< 1 minute ago] They (Jin Lee) mention sherlock, which is using slurm and centos 7, so whatever worked there should work on SCG.
As i mentioned above, i added -mem_macs2 30G
and then I also added -mem_ataqc 40G
and I almost got through it but i hit another error.
stderrorNBM_15-1.txt
stdoutNBM_15-1.txt
Any suggestions?
MACS2 tasks were done successfully. But it looks like there are no peaks meeting the default IDR threshold (0.1
). Please try with more relaxed IDR threshold, say -idr_thresh 0.2
. You can resume pipelines with the same command line that you started pipelines with.
For debugging, please run the following and post output here.
ls -l /srv/gsfs0/projects/snyder/aaron/FAP/ATACseq/data_miseq_ATAC_NBM15-55-40/analysis/merged_fastqs/NBM_15-1_Peaks_pipeline_out/peak/macs2/idr/pseudo_reps/rep1/
That's not a good sign. It means there is something wrong with the data. You really should not need to relax the threshold.
Anshul
On Thu, Apr 5, 2018, 7:27 AM Jin Lee notifications@github.com wrote:
MACS2 tasks were done successfully. But it looks like there are no peaks meeting the default IDR threshold (0.1). Please try with more relaxed IDR threshold, say -idr_thresh 0.2. You can resume pipelines with the same command line that you started pipelines with.
For debugging, please run the following and post output here.
ls -l /srv/gsfs0/projects/snyder/aaron/FAP/ATACseq/data_miseq_ATAC_NBM15-55-40/analysis/merged_fastqs/NBM_15-1_Peaks_pipeline_out/peak/macs2/idr/pseudo_reps/rep1/
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/98#issuecomment-378955333, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7ETUQo_kF187YSGKqNYrofyzpj1zKks5tlinbgaJpZM4TAwTC .
Thank you both very much for your help on this. This was actually a very low starting input so it would make sense to me that it is hard to detect the peaks. I will lower the threshold for now and see how my peaks look. fordebugging.txt
Pipeline now supports SLURM on new SCG. Please read the following instruction carefully and try again.
There has been some important updates about SLURM support on Kundaje lab's genomic pipelines (ChIP-Seq and ATAC-Seq) for the new SCG cluster.
0) You don't need to re-install Conda and dependencies
1) Update BigDataScript (IMPORTANT) cd $HOME rm -rf .bds/ wget https://github.com/leepc12/BigDataScript/blob/master/distro/bds_Linux.tgz?raw=true -O bds_Linux.tgz tar zxvf bds_Linux.tgz
2) Update pipeline code Do git pull on the pipeline git directory.
3) Usage Add -q YOUR_PI_ACCOUNT_ON_SCG to the pipeline run command line. The parameter -q is used for sbatch --account on SCG and -p on Sherlock.
-ATAC-Seq: bds atac.bds ... -q YOUR_PI_ACCOUNT_ON_SCG -ChIP-Seq: bds chipseq.bds ... -q YOUR_PI_ACCOUNT_ON_SCG
Thanks,
Jin
closing this due to long inactivity.
Hi,
coming back to the initial question. I run into the same issue. Actually I got is solved by using TMPDIR to my ~/.bashrc. However, we have a central installation and I am setting TMPDIR TMP_DIR and TMP in an additional shell script:
export TMP=/clscratch/${USER}/atac_temp
export TMPDIR=/clscratch/${USER}/atac_temp
export TMP_DIR=/clscratch/${USER}/atac_temp
bds atac ....
If I check task.postalign_bam.dedup_bam_PE_1_rep1.line_278.id_11.stdout.cluster, I see the following:
declare -x TMP="/tmp/12990246.1.default.q" -- defined where?
declare -x TMPDIR="/scratch/kiefefl2/" -- defined in ~/.bashrc
declare -x TMP_DIR="/clscratch/kiefefl2/atac_temp" -- defined the ATAC caller script
Why is $TMP being overwritten? Ideally, I could define the variables upfront and not in user specific .bashrc files.
Any ideas how to solve this? Thanks a lot!
Cheers, Flo
Sorry I don't know. This looks like a BDS problem. Post this issue on BDS github site. or your system problem? Some linux system overwrites env vars.
In my experience, I could define these env vars (TMP and TMPDIR) in ~/.bashrc
and pipeline successfully took them from it.
Hi,
thanks for the suggestion. We will use that workaround for the moment.
However, it would be handy to have a dedicated TMP variable (or use the java_tmp_dir option) to specify TMP for sambamba. At least the sort option has a tmp option:
--tmpdir=TMPDIR Use TMPDIR to output sorted chunks. Default behaviour is to use system temporary directory.
Btw, thank you very much to maintaining this awesome pipeline!
Best, Flo
Thanks for your suggestion, but we don't want to add any extra parameters for defining temporary directories.
The only parameter for temporary directory in the pipeline is -java_tmp_dir
. We added it because Java does not automatically choose TMPDIR as its temporary directory.
Please let users define TMPDIR
in their own ~/.bashrc
s even though they are using central installation of the pipeline.
Hi Jin,
thanks a lot. We will continue with this model.
Best, Flo
On Wed, May 23, 2018 at 11:09 PM Jin Lee notifications@github.com wrote:
Thanks for your suggestion, but we don't want to add any extra parameters for defining temporary directories.
The only parameter for temporary directory in the pipeline is -java_tmp_dir. We added it because Java does not automatically choose TMPDIR as its temporary directory.
Please let users define TMPDIR in their own ~/.bashrcs even though they are using central installation of the pipeline.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/98#issuecomment-391497152, or mute the thread https://github.com/notifications/unsubscribe-auth/AGEDPNCM7BgJJ5tOIcviyrym5bM43Ak8ks5t1c_-gaJpZM4TAwTC .
Unsure why my task failed here. Any suggestion? There are 8 other runs like this too but here are the stdout and stderr files for the one that finished last.
Thank you very much for your help, Aaron Horning ahorning at stanford period stderrorNBM_55-4.txt stdoutNBM_55-4.txt
edu