nanshe-org / nanshe

An image processing toolkit
https://nanshe-org.github.io/nanshe
BSD 3-Clause "New" or "Revised" License
6 stars 6 forks source link

test_read_numpy_structured_array_from_HDF5_1 fails temporary test directory cleanup on cluster #460

Closed jakirkham closed 6 years ago

jakirkham commented 6 years ago

Got a rather perplexing error on the cluster where it failed to teardown one of the temporary directories used for testing as it claimed it was not empty. Inspection after the test run shows that the directory in question is in fact empty. As the directory did have content, it somehow managed to clear the directory's content, but then failed to clear the directory. In particular, this occurred twice from two separate runs with the test test_read_numpy_structured_array_from_HDF5_1. To make the error more strange, we already use shutil.rmtree to remove the temporary directory, which should clean out any remnants before removing the directory. So it's unclear why this error cropped up at all. Traceback from the error and environment details follow below.

Example: ```python ====================================================================== ERROR: tests.test_nanshe.test_io.test_hdf5.test_serializers.TestSerializers.test_read_numpy_structured_array_from_HDF5_1 ---------------------------------------------------------------------- Traceback (most recent call last): File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/site-packages/nose/case.py", line 384, in tearDown try_run(self.inst, ('teardown', 'tearDown')) File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/site-packages/nose/util.py", line 471, in try_run return func() File "/groups/dudman/home/kirkhamj/nanshe/tests/test_nanshe/test_io/test_hdf5/test_serializers.py", line 249, in teardown shutil.rmtree(self.temp_dir) File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/shutil.py", line 484, in rmtree onerror(os.rmdir, path, sys.exc_info()) File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/shutil.py", line 482, in rmtree os.rmdir(path) OSError: [Errno 39] Directory not empty: '/groups/dudman/home/kirkhamj/tmp/tmpgppaa11f' ---------------------------------------------------------------------- ```


Environment: ```yaml channels: - nanshe - conda-forge - defaults dependencies: - asn1crypto=0.22.0=py36_0 - backports=1.0=py36_1 - backports.functools_lru_cache=1.4=py36_1 - bkcharts=0.2=py36_0 - blas=1.1=openblas - bleach=2.0.0=py36_0 - bokeh=0.12.9=py36_0 - boost=1.64.0=py36_4 - boost-cpp=1.64.0=1 - bottleneck=1.2.1=py36_1 - bzip2=1.0.6=1 - ca-certificates=2017.7.27.1=0 - certifi=2017.7.27.1=py36_0 - cffi=1.10.0=py36_0 - chardet=3.0.4=py36_0 - click=6.7=py36_0 - cloudpickle=0.4.0=py36_0 - cryptography=2.0.3=py36_0 - cycler=0.10.0=py36_0 - cytoolz=0.8.2=py36_0 - dask=0.15.4=py_0 - dask-core=0.15.4=py_0 - dask-distance=0.2.0=py_0 - dask-imread=0.1.1=py36_0 - dask-ndfilters=0.1.2=py36_0 - dask-ndfourier=0.1.2=py36_0 - dask-ndmeasure=0.1.1=py36_0 - dask-ndmorph=0.1.1=py36_0 - dbus=1.10.22=0 - decorator=4.1.2=py36_0 - distributed=1.19.3=py36_0 - drmaa=0.7.7=py36_0 - entrypoints=0.2.3=py36_1 - expat=2.2.1=0 - fasteners=0.14.1=py36_2 - fftw=3.3.6=3 - fontconfig=2.12.1=4 - freetype=2.7=1 - future=0.16.0=py36_0 - gettext=0.19.7=1 - glances=2.11=py_0 - glib=2.53.5=0 - gmp=6.1.2=0 - gst-plugins-base=1.8.0=0 - gstreamer=1.8.0=1 - h5py=2.7.1=py36_1 - hdf5=1.8.18=1 - heapdict=1.0.0=py36_0 - html5lib=0.999999999=py36_0 - icu=58.1=1 - idna=2.6=py36_1 - imageio=2.2.0=py36_0 - imgroi=0.0.2=py36_0 - ipykernel=4.6.1=py36_0 - ipyparallel=6.0.2=py36_0 - ipython=6.2.1=py36_0 - ipython_genutils=0.2.0=py36_0 - ipywidgets=7.0.3=py_0 - jedi=0.10.2=py36_0 - jinja2=2.9.6=py36_0 - jpeg=9b=1 - jsonschema=2.6.0=py36_0 - jupyter_client=5.1.0=py36_0 - jupyter_contrib_core=0.3.3=py36_0 - jupyter_contrib_nbextensions=0.3.3=py36_0 - jupyter_core=4.3.0=py36_0 - jupyter_highlight_selected_word=0.0.11=py36_0 - jupyter_latex_envs=1.3.8.2=py36_1 - jupyter_nbextensions_configurator=0.2.8=py36_0 - kenjutsu=0.5.1=py36_0 - libffi=3.2.1=3 - libiconv=1.14=4 - libpng=1.6.28=1 - libsodium=1.0.10=0 - libtiff=4.0.7=0 - libxcb=1.12=1 - libxml2=2.9.5=0 - libxslt=1.1.29=5 - locket=0.2.0=py36_1 - lxml=4.1.0=py36_0 - mahotas=1.4.3=np113py36_1 - markupsafe=1.0=py36_0 - matplotlib=2.1.0=py36_0 - metawrap=0.0.2=py36_0 - mistune=0.7.4=py36_0 - monotonic=1.3=py36_0 - mplview=0.0.2=py36_0 - msgpack-python=0.4.8=py36_0 - nbconvert=5.3.1=py_1 - nbformat=4.4.0=py36_0 - ncurses=5.9=10 - networkx=2.0=py36_0 - nose=1.3.7=py36_2 - notebook=5.2.0=py36_1 - npctypes=0.0.4=py36_0 - numpy=1.13.3=py36_blas_openblas_200 - olefile=0.44=py36_0 - openblas=0.2.19=2 - openssl=1.0.2l=0 - pandas=0.20.3=py36_1 - pandoc=1.19.2=0 - pandocfilters=1.4.1=py36_0 - partd=0.3.8=py36_0 - pcre=8.39=0 - pexpect=4.2.1=py36_0 - pickleshare=0.7.4=py36_0 - pillow=4.3.0=py36_0 - pims=0.4.1=py_1 - pip=9.0.1=py36_0 - prompt_toolkit=1.0.15=py36_0 - psutil=5.4.0=py36_0 - ptyprocess=0.5.2=py36_0 - pycparser=2.18=py36_0 - pygments=2.2.0=py36_0 - pyopenssl=17.2.0=py36_0 - pyparsing=2.2.0=py36_0 - pyqt=5.6.0=py36_4 - pysocks=1.6.7=py36_0 - python=3.6.3=0 - python-dateutil=2.6.1=py36_0 - pytz=2017.2=py36_0 - pywavelets=0.5.2=np113py36_0 - pyyaml=3.12=py36_1 - pyzmq=16.0.2=py36_2 - qt=5.6.2=3 - rank_filter=0.4.14=np113py36_0 - readline=6.2=0 - requests=2.18.4=py36_1 - scandir=1.6=py36_0 - scikit-image=0.13.0=py36_2 - scikit-learn=0.19.1=py36_blas_openblas_200 - scipy=0.19.1=py36_blas_openblas_202 - setuptools=36.6.0=py36_1 - simplegeneric=0.8.1=py36_0 - sip=4.18=py36_1 - six=1.11.0=py36_1 - slicerator=0.9.8=py_1 - sortedcontainers=1.5.7=py36_0 - splauncher=0.0.10=py36_0 - sqlite=3.13.0=1 - tblib=1.3.2=py36_0 - terminado=0.6=py36_0 - testpath=0.3.1=py36_0 - tifffile=0.12.1=py36_1 - tk=8.5.19=2 - toolz=0.8.2=py_2 - tornado=4.5.2=py36_0 - traitlets=4.3.2=py36_0 - urllib3=1.22=py36_0 - vigra=1.11.1=np113py36_2 - wcwidth=0.1.7=py36_0 - webcolors=1.7=py36_0 - webencodings=0.5=py36_0 - wheel=0.30.0=py_1 - widgetsnbextension=3.0.6=py36_0 - xnumpy=0.1.2=py36_0 - xorg-libxau=1.0.8=3 - xorg-libxdmcp=1.1.2=3 - xz=5.2.3=0 - yail=0.0.2=py36_0 - yaml=0.1.6=0 - zarr=2.1.4=py36_0 - zeromq=4.2.1=1 - zict=0.1.3=py_0 - zlib=1.2.8=3 - libgfortran=3.0.0=1 - nanshe=0.1.0a59=py36_0 - pip: - nanshe-workflow (/groups/dudman/home/kirkhamj/nanshe_workflow)==2.7.0 prefix: /groups/dudman/home/kirkhamj/miniconda/envs/nanshenv ```
jakirkham commented 6 years ago

This really ancient thread may be of some use.

jakirkham commented 6 years ago

Seems others have seen this with NFS too. Not sure how we didn't hit this sooner. Also still confused as to why it repeatably happens with just this one test and none of the other tests doing the exact same thing. 😖

Looks like common wisdom is to just keep trying until the directory eventually gets removed. 😒 Some examples include ( https://github.com/easybuilders/easybuild-framework/pull/353 ) ( https://github.com/conda/conda/issues/850 ) ( https://github.com/hashdist/hashdist/pull/116 ).

jakirkham commented 6 years ago

FWIW tried to use try...except with the diff below, but it seems to have only irritated NFS. 😩 Maybe swallowing the exception gives NFS enough time to get its self straightened out. So that may be what is required to fix this.

Diff: ```diff diff --git a/tests/test_nanshe/test_io/test_hdf5/test_serializers.py b/tests/test_nanshe/test_io/test_hdf5/test_serializers.py index 6403bf4..fc72191 100644 --- a/tests/test_nanshe/test_io/test_hdf5/test_serializers.py +++ b/tests/test_nanshe/test_io/test_hdf5/test_serializers.py @@ -246,7 +246,10 @@ class TestSerializers(object): self.temp_hdf5_file = None - shutil.rmtree(self.temp_dir) + try: + shutil.rmtree(self.temp_dir) + except OSError: + shutil.rmtree(self.temp_dir) self.temp_dir = "" ```


Traceback: ```python ====================================================================== ERROR: tests.test_nanshe.test_io.test_hdf5.test_serializers.TestSerializers.test_read_numpy_structured_array_from_HDF5_1 ---------------------------------------------------------------------- Traceback (most recent call last): File "/groups/dudman/home/kirkhamj/nanshe/tests/test_nanshe/test_io/test_hdf5/test_serializers.py", line 250, in teardown shutil.rmtree(self.temp_dir) File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/shutil.py", line 484, in rmtree onerror(os.rmdir, path, sys.exc_info()) File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/shutil.py", line 482, in rmtree os.rmdir(path) OSError: [Errno 39] Directory not empty: '/groups/dudman/home/kirkhamj/tmp/tmpa87vu797' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/site-packages/nose/case.py", line 384, in tearDown try_run(self.inst, ('teardown', 'tearDown')) File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/site-packages/nose/util.py", line 471, in try_run return func() File "/groups/dudman/home/kirkhamj/nanshe/tests/test_nanshe/test_io/test_hdf5/test_serializers.py", line 252, in teardown shutil.rmtree(self.temp_dir) File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/shutil.py", line 480, in rmtree _rmtree_safe_fd(fd, path, onerror) File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/shutil.py", line 438, in _rmtree_safe_fd onerror(os.unlink, fullname, sys.exc_info()) File "/groups/dudman/home/kirkhamj/miniconda/envs/nanshenv/lib/python3.6/shutil.py", line 436, in _rmtree_safe_fd os.unlink(name, dir_fd=topfd) OSError: [Errno 16] Device or resource busy: '.nfs000000094172481900026301' ---------------------------------------------------------------------- ```
jakirkham commented 6 years ago

Looping does the trick. Gross! 😝 Working diff included below.

Diff: ```diff diff --git a/tests/test_nanshe/test_io/test_hdf5/test_serializers.py b/tests/test_nanshe/test_io/test_hdf5/test_serializers.py index 6403bf4..54f132e 100644 --- a/tests/test_nanshe/test_io/test_hdf5/test_serializers.py +++ b/tests/test_nanshe/test_io/test_hdf5/test_serializers.py @@ -246,7 +246,11 @@ class TestSerializers(object): self.temp_hdf5_file = None - shutil.rmtree(self.temp_dir) + for i in range(3): + try: + shutil.rmtree(self.temp_dir) + except OSError: + pass self.temp_dir = "" ```
jakirkham commented 6 years ago

Turns out the .nfs* file is an indicator that a file handle is still open and is called a "silly name". Some more info on this in these NFS FAQs.

jakirkham commented 6 years ago

Looks like we did have another file handle open in that test. PR ( https://github.com/nanshe-org/nanshe/pull/468 ) closes the lingering open file handle.