kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
160 stars 81 forks source link

Python egg cache error #72

Closed albertoriva closed 6 years ago

albertoriva commented 6 years ago

I'm getting the following error when running the pipeline on 12 fastq files:

            The following error occurred while trying to extract file(s) to the Python egg
            cache:

              [Errno 17] File exists: '/home/ariva/.cache'

            The Python egg cache directory is currently set to:

              /home/ariva/.cache/Python-Eggs

            Perhaps your account does not have write access to this directory?  You can
            change the cache directory by setting the PYTHON_EGG_CACHE environment
            variable to point to an accessible directory.

I tried removing the .cache directory and rerunning, but I get the same error. I can attach the whole log file if necessary.

Thank you in advance for any help!

albertoriva commented 6 years ago

I forgot to add... permissions are OK, I'm running the pipeline under my own username and I obviously have read/write access to .cache.

xiangzhu commented 6 years ago

I came across the same type of error when running the pipelines on Stanford Sherlock 1.0. Specifically,

pkg_resources.ExtractionError: Can't extract file(s) to egg cache

        The following error occurred while trying to extract file(s) to the Python egg
        cache:

          [Errno 17] File exists: '/scratch/PI/whwong/xiangzhu/Python-Eggs/MACS2-2.1.0.20150731-py2.7-linux-x86_64.egg-tmp'

        The Python egg cache directory is currently set to:

          /scratch/PI/whwong/xiangzhu/Python-Eggs/

        Perhaps your account does not have write access to this directory?  You can
        change the cache directory by setting the PYTHON_EGG_CACHE environment
        variable to point to an accessible directory.

Fatal error: /home/xiangzhu/atac_dnase_pipelines/atac.bds, line 1201, pos 2. Task/s failed.
atac.bds, line 82 : main()
atac.bds, line 85 : void main() { // atac pipeline starts here
atac.bds, line 97 :     call_peaks()
atac.bds, line 1019 :   void call_peaks() { // for pooling two replicates and calling peaks on them
atac.bds, line 1201 :       wait

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.

Here is the permission info:

[xiangzhu@sherlock-ln02 login_node ~]$ ls -la /scratch/PI/whwong/xiangzhu/Python-Eggs/
total 12
drwxr-sr-x 3 xiangzhu whwong 4096 Nov 18 15:13 .
drwxr-sr-x 4 xiangzhu whwong 4096 Nov 17 20:37 ..
drwxr-sr-x 3 xiangzhu whwong 4096 Nov 18 15:13 MACS2-2.1.0.20150731-py2.7-linux-x86_64.egg-tmp

I added the following line in .bashrc to set up this folder:

# setting the PYTHON_EGG_CACHE environment
export PYTHON_EGG_CACHE=/scratch/PI/whwong/xiangzhu/Python-Eggs/

I used the following command to run the pipelines:

bds_scr $SUFFIX -s slurm $HOME/atac_dnase_pipelines/atac.bds -pe -title $SUFFIX -species hg19 -q whwong -memory 24G -wt 24:00:00 -nth 8 -fastq1_1 $FASTQ1_1 -fastq1_2 $FASTQ1_2

I wonder if you could provide any thoughts/ideas about how to addressing this issue. Thanks!

leepc12 commented 6 years ago

I am very sorry for late response for this issue. I actually missed it. I am also working on Sherlock (but on 2.0). Can you check your python egg and permission for it? Mine is like the following:

[leepc12@sh-ln01 login! ~]$ ll ~/.python-eggs
total 40
drwxr-xr-x 3 leepc12 akundaje 23 Aug  4  2016 MACS2-2.1.0.20150731-py2.7-linux-x86_64.egg-tmp
[leepc12@sh-ln01 login! ~]$ ll ~/.python-eggs -d
drwxr-xr-x 3 leepc12 akundaje 65 Aug  4  2016 /home/users/leepc12/.python-eggs
[leepc12@sh-ln01 login! ~]$

I actually don't have any python egg settings on my ~/.bashrc

Ruismart commented 6 years ago

Hi, I got the same error, please check https://stackoverflow.com/questions/2192323/what-is-the-python-egg-cache-python-egg-cache as mentioned, there are 3 ways to solve this kind of python egg extraction error. for me, at first , I changed the cache path, but nothing changed

 export PYTHON_EGG_CACHE=$HOME/.cache/newegg/

I tried nth=1 then , it worked as expected, that too many things used MACS***.egg at the same time caused this extraction error ! but ,too slow in this way , might as well run it on my PC... so, at last, I just did this

cd  ***/miniconda3/envs/bds_atac/lib/python2.7/site-packages
unzip MACS2-2.1.1.20160309-py2.7-linux-x86_64.egg   #get a unzipped package 'MACS2'

It would be the solution.

leepc12 commented 6 years ago

@Ruismart I appreciate your help. That method worked.

raju2b commented 5 years ago

we can do this at the Process Level or in the Module init.py like below.

import os
os.environ['PYTHON_EGG_CACHE'] = "/home/<user>/.python-eggs"

Just taking a step back and trying to understand why not simply install unzipped eggs instead of installing zipped eggs which python interpreter will try to extract on very first import? Or Is there any specific reason behind installing the zipped egg for python package instead of unzipped packages(Which avoids Extraction of EGG on very first import) ?