soft-matter / trackpy

Python particle tracking toolkit
http://soft-matter.github.io/trackpy
Other
442 stars 131 forks source link

PyTables >= 3.6 doesn't allow opening multiple files #643

Closed caspervdw closed 3 years ago

caspervdw commented 3 years ago

See builds e.g. https://github.com/caspervdw/trackpy/runs/1946300547

The important part of the traceback:

E ValueError: PyTables [3.6.1] no longer supports opening multiple files E even in read-only mode on this HDF5 version [1.8.5-patch1]. You can accept this E and not open the same file multiple times at once, E upgrade the HDF5 version, or downgrade to PyTables 3.0.0 which allows E files to be opened multiple times at once

The question is: can we repare this, or do we need to constrain to PyTables < 3.6?

@nkeim Do you have any thoughts about this?

Full traceback:

__________________ TestPandasHDFStoreSingleNode.test_storage ___________________

self = <class 'pandas.io.pytables.HDFStore'>
File path: /home/runner/work/trackpy/trackpy/temp_for_testing_5115073464.h5

mode = 'a', kwargs = {}
tables = <module 'tables' from '/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/tables/__init__.py'>
hdf_version = '1.8.5-patch1'

    def open(self, mode: str = "a", **kwargs):
        """
        Open the file in the specified mode

        Parameters
        ----------
        mode : {'a', 'w', 'r', 'r+'}, default 'a'
            See HDFStore docstring or tables.open_file for info about modes
        **kwargs
            These parameters will be passed to the PyTables open_file method.
        """
        tables = _tables()

        if self._mode != mode:

            # if we are changing a write mode to read, ok
            if self._mode in ["a", "w"] and mode in ["r", "r+"]:
                pass
            elif mode in ["w"]:

                # this would truncate, raise here
                if self.is_open:
                    raise PossibleDataLossError(
                        f"Re-opening the file [{self._path}] with mode [{self._mode}] "
                        "will delete the current file!"
                    )

            self._mode = mode

        # close and reopen the handle
        if self.is_open:
            self.close()

        if self._complevel and self._complevel > 0:
            self._filters = _tables().Filters(
                self._complevel, self._complib, fletcher32=self._fletcher32
            )

        try:
>           self._handle = tables.open_file(self._path, self._mode, **kwargs)

/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/pandas/io/pytables.py:697: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

filename = '/home/runner/work/trackpy/trackpy/temp_for_testing_5115073464.h5'
mode = 'a', title = '', root_uep = '/', filters = None, kwargs = {}

    def open_file(filename, mode="r", title="", root_uep="/", filters=None,
                  **kwargs):
        """Open a PyTables (or generic HDF5) file and return a File object.

        Parameters
        ----------
        filename : str
            The name of the file (supports environment variable expansion).
            It is suggested that file names have any of the .h5, .hdf or
            .hdf5 extensions, although this is not mandatory.
        mode : str
            The mode to open the file. It can be one of the
            following:

                * *'r'*: Read-only; no data can be modified.
                * *'w'*: Write; a new file is created (an existing file
                  with the same name would be deleted).
                * *'a'*: Append; an existing file is opened for reading and
                  writing, and if the file does not exist it is created.
                * *'r+'*: It is similar to 'a', but the file must already
                  exist.

        title : str
            If the file is to be created, a TITLE string attribute will be
            set on the root group with the given value. Otherwise, the
            title will be read from disk, and this will not have any effect.
        root_uep : str
            The root User Entry Point. This is a group in the HDF5 hierarchy
            which will be taken as the starting point to create the object
            tree. It can be whatever existing group in the file, named by
            its HDF5 path. If it does not exist, an HDF5ExtError is issued.
            Use this if you do not want to build the *entire* object tree,
            but rather only a *subtree* of it.

            .. versionchanged:: 3.0
               The *rootUEP* parameter has been renamed into *root_uep*.

        filters : Filters
            An instance of the Filters (see :ref:`FiltersClassDescr`) class
            that provides information about the desired I/O filters
            applicable to the leaves that hang directly from the *root group*,
            unless other filter properties are specified for these leaves.
            Besides, if you do not specify filter properties for child groups,
            they will inherit these ones, which will in turn propagate to
            child nodes.

        Notes
        -----
        In addition, it recognizes the (lowercase) names of parameters
        present in :file:`tables/parameters.py` as additional keyword
        arguments.
        See :ref:`parameter_files` for a detailed info on the supported
        parameters.

        .. note::

            If you need to deal with a large number of nodes in an
            efficient way, please see :ref:`LRUOptim` for more info and
            advices about the integrated node cache engine.

        """

        # XXX filename normalization ??

        # Check already opened files
        if _FILE_OPEN_POLICY == 'strict':
            # This policy do not allows to open the same file multiple times
            # even in read-only mode
            if filename in _open_files:
>               raise ValueError(
                    "The file '%s' is already opened.  "
                    "Please close it before reopening.  "
                    "HDF5 v.%s, FILE_OPEN_POLICY = '%s'" % (
                        filename, utilsextension.get_hdf5_version(),
                        _FILE_OPEN_POLICY))
E               ValueError: The file '/home/runner/work/trackpy/trackpy/temp_for_testing_5115073464.h5' is already opened.  Please close it before reopening.  HDF5 v.1.8.5-patch1, FILE_OPEN_POLICY = 'strict'

/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/tables/file.py:288: ValueError

During handling of the above exception, another exception occurred:

self = <trackpy.tests.test_feature_saving.TestPandasHDFStoreSingleNode testMethod=test_storage>

    def test_storage(self):
        STORE_NAME = 'temp_for_testing_{}.h5'.format(_random_hash())
        if os.path.isfile(STORE_NAME):
            os.remove(STORE_NAME)
        try:
>           s = self.storage_class(STORE_NAME)

trackpy/tests/test_feature_saving.py:57: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
trackpy/framewise_data.py:248: in __init__
    store = pd.HDFStore(self.filename)
/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/pandas/io/pytables.py:553: in __init__
    self.open(mode=mode, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <class 'pandas.io.pytables.HDFStore'>
File path: /home/runner/work/trackpy/trackpy/temp_for_testing_5115073464.h5

mode = 'a', kwargs = {}
tables = <module 'tables' from '/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/tables/__init__.py'>
hdf_version = '1.8.5-patch1'

    def open(self, mode: str = "a", **kwargs):
        """
        Open the file in the specified mode

        Parameters
        ----------
        mode : {'a', 'w', 'r', 'r+'}, default 'a'
            See HDFStore docstring or tables.open_file for info about modes
        **kwargs
            These parameters will be passed to the PyTables open_file method.
        """
        tables = _tables()

        if self._mode != mode:

            # if we are changing a write mode to read, ok
            if self._mode in ["a", "w"] and mode in ["r", "r+"]:
                pass
            elif mode in ["w"]:

                # this would truncate, raise here
                if self.is_open:
                    raise PossibleDataLossError(
                        f"Re-opening the file [{self._path}] with mode [{self._mode}] "
                        "will delete the current file!"
                    )

            self._mode = mode

        # close and reopen the handle
        if self.is_open:
            self.close()

        if self._complevel and self._complevel > 0:
            self._filters = _tables().Filters(
                self._complevel, self._complib, fletcher32=self._fletcher32
            )

        try:
            self._handle = tables.open_file(self._path, self._mode, **kwargs)
        except IOError as err:  # pragma: no cover
            if "can not be written" in str(err):
                print(f"Opening {self._path} in read-only mode")
                self._handle = tables.open_file(self._path, "r", **kwargs)
            else:
                raise

        except ValueError as err:

            # trap PyTables >= 3.1 FILE_OPEN_POLICY exception
            # to provide an updated message
            if "FILE_OPEN_POLICY" in str(err):
                hdf_version = tables.get_hdf5_version()
                err = ValueError(
                    f"PyTables [{tables.__version__}] no longer supports "
                    "opening multiple files\n"
                    "even in read-only mode on this HDF5 version "
                    f"[{hdf_version}]. You can accept this\n"
                    "and not open the same file multiple times at once,\n"
                    "upgrade the HDF5 version, or downgrade to PyTables 3.0.0 "
                    "which allows\n"
                    "files to be opened multiple times at once\n"
                )

>           raise err
E           ValueError: PyTables [3.6.1] no longer supports opening multiple files
E           even in read-only mode on this HDF5 version [1.8.5-patch1]. You can accept this
E           and not open the same file multiple times at once,
E           upgrade the HDF5 version, or downgrade to PyTables 3.0.0 which allows
E           files to be opened multiple times at once

/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/pandas/io/pytables.py:722: ValueError
_____________ TestPandasHDFStoreSingleNodeCompressed.test_storage ______________

self = <class 'pandas.io.pytables.HDFStore'>
File path: /home/runner/work/trackpy/trackpy/temp_for_testing_2947855870.h5

mode = 'a', kwargs = {}
tables = <module 'tables' from '/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/tables/__init__.py'>
hdf_version = '1.8.5-patch1'

    def open(self, mode: str = "a", **kwargs):
        """
        Open the file in the specified mode

        Parameters
        ----------
        mode : {'a', 'w', 'r', 'r+'}, default 'a'
            See HDFStore docstring or tables.open_file for info about modes
        **kwargs
            These parameters will be passed to the PyTables open_file method.
        """
        tables = _tables()

        if self._mode != mode:

            # if we are changing a write mode to read, ok
            if self._mode in ["a", "w"] and mode in ["r", "r+"]:
                pass
            elif mode in ["w"]:

                # this would truncate, raise here
                if self.is_open:
                    raise PossibleDataLossError(
                        f"Re-opening the file [{self._path}] with mode [{self._mode}] "
                        "will delete the current file!"
                    )

            self._mode = mode

        # close and reopen the handle
        if self.is_open:
            self.close()

        if self._complevel and self._complevel > 0:
            self._filters = _tables().Filters(
                self._complevel, self._complib, fletcher32=self._fletcher32
            )

        try:
>           self._handle = tables.open_file(self._path, self._mode, **kwargs)

/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/pandas/io/pytables.py:697: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

filename = '/home/runner/work/trackpy/trackpy/temp_for_testing_2947855870.h5'
mode = 'a', title = '', root_uep = '/', filters = None, kwargs = {}

    def open_file(filename, mode="r", title="", root_uep="/", filters=None,
                  **kwargs):
        """Open a PyTables (or generic HDF5) file and return a File object.

        Parameters
        ----------
        filename : str
            The name of the file (supports environment variable expansion).
            It is suggested that file names have any of the .h5, .hdf or
            .hdf5 extensions, although this is not mandatory.
        mode : str
            The mode to open the file. It can be one of the
            following:

                * *'r'*: Read-only; no data can be modified.
                * *'w'*: Write; a new file is created (an existing file
                  with the same name would be deleted).
                * *'a'*: Append; an existing file is opened for reading and
                  writing, and if the file does not exist it is created.
                * *'r+'*: It is similar to 'a', but the file must already
                  exist.

        title : str
            If the file is to be created, a TITLE string attribute will be
            set on the root group with the given value. Otherwise, the
            title will be read from disk, and this will not have any effect.
        root_uep : str
            The root User Entry Point. This is a group in the HDF5 hierarchy
            which will be taken as the starting point to create the object
            tree. It can be whatever existing group in the file, named by
            its HDF5 path. If it does not exist, an HDF5ExtError is issued.
            Use this if you do not want to build the *entire* object tree,
            but rather only a *subtree* of it.

            .. versionchanged:: 3.0
               The *rootUEP* parameter has been renamed into *root_uep*.

        filters : Filters
            An instance of the Filters (see :ref:`FiltersClassDescr`) class
            that provides information about the desired I/O filters
            applicable to the leaves that hang directly from the *root group*,
            unless other filter properties are specified for these leaves.
            Besides, if you do not specify filter properties for child groups,
            they will inherit these ones, which will in turn propagate to
            child nodes.

        Notes
        -----
        In addition, it recognizes the (lowercase) names of parameters
        present in :file:`tables/parameters.py` as additional keyword
        arguments.
        See :ref:`parameter_files` for a detailed info on the supported
        parameters.

        .. note::

            If you need to deal with a large number of nodes in an
            efficient way, please see :ref:`LRUOptim` for more info and
            advices about the integrated node cache engine.

        """

        # XXX filename normalization ??

        # Check already opened files
        if _FILE_OPEN_POLICY == 'strict':
            # This policy do not allows to open the same file multiple times
            # even in read-only mode
            if filename in _open_files:
>               raise ValueError(
                    "The file '%s' is already opened.  "
                    "Please close it before reopening.  "
                    "HDF5 v.%s, FILE_OPEN_POLICY = '%s'" % (
                        filename, utilsextension.get_hdf5_version(),
                        _FILE_OPEN_POLICY))
E               ValueError: The file '/home/runner/work/trackpy/trackpy/temp_for_testing_2947855870.h5' is already opened.  Please close it before reopening.  HDF5 v.1.8.5-patch1, FILE_OPEN_POLICY = 'strict'

/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/tables/file.py:288: ValueError

During handling of the above exception, another exception occurred:

self = <trackpy.tests.test_feature_saving.TestPandasHDFStoreSingleNodeCompressed testMethod=test_storage>

    def test_storage(self):
        STORE_NAME = 'temp_for_testing_{}.h5'.format(_random_hash())
        if os.path.isfile(STORE_NAME):
            os.remove(STORE_NAME)
        try:
>           s = self.storage_class(STORE_NAME)

trackpy/tests/test_feature_saving.py:57: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
trackpy/framewise_data.py:248: in __init__
    store = pd.HDFStore(self.filename)
/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/pandas/io/pytables.py:553: in __init__
    self.open(mode=mode, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <class 'pandas.io.pytables.HDFStore'>
File path: /home/runner/work/trackpy/trackpy/temp_for_testing_2947855870.h5

mode = 'a', kwargs = {}
tables = <module 'tables' from '/opt/hostedtoolcache/Python/3.9.1/x64/lib/python3.9/site-packages/tables/__init__.py'>
hdf_version = '1.8.5-patch1'

    def open(self, mode: str = "a", **kwargs):
        """
        Open the file in the specified mode

        Parameters
        ----------
        mode : {'a', 'w', 'r', 'r+'}, default 'a'
            See HDFStore docstring or tables.open_file for info about modes
        **kwargs
            These parameters will be passed to the PyTables open_file method.
        """
        tables = _tables()

        if self._mode != mode:

            # if we are changing a write mode to read, ok
            if self._mode in ["a", "w"] and mode in ["r", "r+"]:
                pass
            elif mode in ["w"]:

                # this would truncate, raise here
                if self.is_open:
                    raise PossibleDataLossError(
                        f"Re-opening the file [{self._path}] with mode [{self._mode}] "
                        "will delete the current file!"
                    )

            self._mode = mode

        # close and reopen the handle
        if self.is_open:
            self.close()

        if self._complevel and self._complevel > 0:
            self._filters = _tables().Filters(
                self._complevel, self._complib, fletcher32=self._fletcher32
            )

        try:
            self._handle = tables.open_file(self._path, self._mode, **kwargs)
        except IOError as err:  # pragma: no cover
            if "can not be written" in str(err):
                print(f"Opening {self._path} in read-only mode")
                self._handle = tables.open_file(self._path, "r", **kwargs)
            else:
                raise

        except ValueError as err:

            # trap PyTables >= 3.1 FILE_OPEN_POLICY exception
            # to provide an updated message
            if "FILE_OPEN_POLICY" in str(err):
                hdf_version = tables.get_hdf5_version()
                err = ValueError(
                    f"PyTables [{tables.__version__}] no longer supports "
                    "opening multiple files\n"
                    "even in read-only mode on this HDF5 version "
                    f"[{hdf_version}]. You can accept this\n"
                    "and not open the same file multiple times at once,\n"
                    "upgrade the HDF5 version, or downgrade to PyTables 3.0.0 "
                    "which allows\n"
                    "files to be opened multiple times at once\n"
                )

>           raise err
E           ValueError: PyTables [3.6.1] no longer supports opening multiple files
E           even in read-only mode on this HDF5 version [1.8.5-patch1]. You can accept this
E           and not open the same file multiple times at once,
E           upgrade the HDF5 version, or downgrade to PyTables 3.0.0 which allows
E           files to be opened multiple times at once
nkeim commented 3 years ago

So far, these tests run fine on my local machine with pytables 3.6.1. I'll investigate further.

nkeim commented 3 years ago

Looks like this is only when using pytables 3.6 with an older hdf5. The GitHub Workflows builds are using hdf5 1.10.4 and this test passes.

Also, from looking at the test code it's unclear how the same file could ever be opened multiple times. So this is likely just an outright bug when using these versions of pytables and hdf5 together.

nkeim commented 3 years ago

Oh! Now I see that the pip environment has the older version of hdf5. My work continues…

nkeim commented 3 years ago

I think I've found the problem. It appears that the Linux pytables wheel for python 3.9 includes the pre-built binary for hdf5 version 1.8.5patch1 (released in 2010). When it is imported, pytables does not use the newer hdf5 that your script had already installed via apt.

The equivalent wheels for python 3.8 and 3.7 do not include binaries. So perhaps this is a mistake?

caspervdw commented 3 years ago

Thanks for diving into this @nkeim ! I think we can view this as a PyTables packaging issue and safely ignore (skip) the test on this specific environment.

In view of the large build time, I would like to stick to the binary wheels.