kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
161 stars 81 forks source link

Error in genome data installation #83

Closed xiangzhu closed 6 years ago

xiangzhu commented 6 years ago

Many thanks for putting together this wonderful set of scripts!

The documentation is very clear, and I have no trouble in installing the pipeline in Stanford Sherlock.

However, I came across the following error when installing hg19 genome data for atac_dnase_pipelines.

I was using this line of command to install data:

srun --pty bash install_genome_data.sh hg19 $HOME/data/bds_pipeline_genome_data

I got the following error:

Exiting Ebwt::buildToDisk()
An error occurred writing the index to disk.  Please check if the disk is full.
Total time for backward call to driver() for mirror index: 00:52:39
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build --wrapper basic-0 male.hg19.fa male.hg19.fa
Deleting "male.hg19.fa.3.bt2" file written during aborted indexing attempt.
Deleting "male.hg19.fa.4.bt2" file written during aborted indexing attempt.
Deleting "male.hg19.fa.1.bt2" file written during aborted indexing attempt.
Deleting "male.hg19.fa.2.bt2" file written during aborted indexing attempt.
Deleting "male.hg19.fa.rev.1.bt2" file written during aborted indexing attempt.
Deleting "male.hg19.fa.rev.2.bt2" file written during aborted indexing attempt.
srun: error: sh-101-56: task 0: Out Of Memory

In addition, I find that the unsuccessful data installation takes a lot of storage space:

Filesystem                      Size  Used Avail Use% Mounted on
srcf.isilon:/ifs/home/xiangzhu   15G   12G  3.4G  78% /home/users/xiangzhu

Before opening this ticket, I have tried to search similar answers from the following links, but I cannot find related answers.

I look forward to your thoughts/suggestions on this matter. Thanks in advance for your time and help!

leepc12 commented 6 years ago

You don't need to install genome data on Sherlock. It's already installed on my scratch and shared by all people using the pipeline on Sherlock. Just skip genome data installation.

xiangzhu commented 6 years ago

Thanks for your reply!

But in this case, do I need to know the [SPECIES_FILE_PATH]?

Then others can use the genome data by adding -species_file [SPECIES_FILE_PATH] to the pipeline command line. Or they need to add species_file = [SPECIES_FILE_PATH] to the section [default] in their ./default.env.

leepc12 commented 6 years ago

No -species_file is already defined in default.env. It has a set of parameters for Sherlock cluster. Pipeline determines if you are on Sherlock or not by hostname.

You just need to specify -species.

Jin

On Thu, Nov 16, 2017 at 12:02 PM, Xiang Zhu notifications@github.com wrote:

Thanks for your reply!

But in this case, do I need to know the [SPECIES_FILE_PATH]?

Then others can use the genome data by adding -species_file [SPECIES_FILE_PATH] to the pipeline command line. Or they need to add species_file = [SPECIES_FILE_PATH] to the section [default] in their ./default.env.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/83#issuecomment-345045708, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_CgpxXQxGGd_amBblPU-6ATtzGnwks5s3JS9gaJpZM4QeMVr .

xiangzhu commented 6 years ago

Thanks! It is working, and I will close this issue.