gramineproject / examples

Sample applications configs for Gramine
BSD 3-Clause "New" or "Revised" License
29 stars 22 forks source link

Scikit fails with No such file or directory: '/usr/share/zoneinfo/tzdata.zi' on Ubuntu 22.04 systems where pytz library is downloaded using apt #68

Closed anjalirai-intel closed 12 months ago

anjalirai-intel commented 1 year ago

Description The reason for scikit failure is pytz library installed by apt from some dependencies.

The version installed by apt is 2022.1, If you install the same version using python or pip the codebase is different. In the apt based pytz library, it has an additional function "_read_olson_version() -> str:" which is trying to read from /usr/share/zoneinfo/tzdata.zi, but this path is not mounted into manifest file

Because of this descrepancies in libraries and path not being mounted, scikit fails. Both of the experiments shown below

The solution is to mount /usr/share/zoneinfo into scikit-learn manifest

Error

gramine-direct ./sklearnex scripts/kmeans_example.py
Traceback (most recent call last):
  File "//scripts/kmeans_example.py", line 7, in <module>
    import pandas as pd
  File "/usr/lib/python3/dist-packages/pandas/__init__.py", line 17, in <module>
    __import__(dependency)
  File "/usr/lib/python3/dist-packages/pytz/__init__.py", line 38, in <module>
    OLSON_VERSION = _read_olson_version()
  File "/usr/lib/python3/dist-packages/pytz/__init__.py", line 29, in _read_olson_version
    with tzdata_zi.open(encoding="utf-8") as tzdata_zi_file:
  File "/usr/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: '/usr/share/zoneinfo/tzdata.zi'
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 72, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 32, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 28, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 36, in <module>
    apt_pkg.init_system()
apt_pkg.Error: E:Unable to determine a suitable packaging system type

Original exception was:
Traceback (most recent call last):
  File "//scripts/kmeans_example.py", line 7, in <module>
    import pandas as pd
  File "/usr/lib/python3/dist-packages/pandas/__init__.py", line 17, in <module>
    __import__(dependency)
  File "/usr/lib/python3/dist-packages/pytz/__init__.py", line 38, in <module>
    OLSON_VERSION = _read_olson_version()
  File "/usr/lib/python3/dist-packages/pytz/__init__.py", line 29, in _read_olson_version
    with tzdata_zi.open(encoding="utf-8") as tzdata_zi_file:
  File "/usr/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: '/usr/share/zoneinfo/tzdata.zi'

APT pytz code:

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ pip3 show pytz
WARNING: Package(s) not found: pytz

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ sudo pip3 show pytz
WARNING: Package(s) not found: pytz

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ sudo apt install python3-tz
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  docutils-common doxygen fonts-font-awesome fonts-lato libaec0 libblosc1 libcjson1 libclang-cpp11 libclang-cpp14 libclang1-14 libffi-dev libhdf5-103-1 libllvm11
  libpfm4 libsnappy1v5 libsz2 libtbb12 libtbbmalloc2 libtinfo-dev libxapian30 libz3-4 libz3-dev llvm-11 llvm-11-dev llvm-11-linker-tools llvm-11-runtime
  llvm-11-tools numba-doc python-babel-localedata python-odf-doc python-odf-tools python-tables-data python3-alabaster python3-bottleneck python3-defusedxml
  python3-docutils python3-et-xmlfile python3-imagesize python3-jdcal python3-llvmlite python3-numba python3-numexpr python3-odf python3-openpyxl python3-pandas-lib
  python3-pymacaroons python3-roman python3-snowballstemmer python3-tables python3-tables-lib python3-tomli python3-tomli-w python3-xlwt sphinx-common
  sphinx-rtd-theme-common
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
  python3-tz
0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded.
Need to get 30.7 kB of archives.
After this operation, 106 kB of additional disk space will be used.
Get:1 http://in.archive.ubuntu.com/ubuntu jammy-updates/main amd64 python3-tz all 2022.1-1ubuntu0.22.04.1 [30.7 kB]
Fetched 30.7 kB in 1s (54.7 kB/s)
Selecting previously unselected package python3-tz.
(Reading database ... 339001 files and directories currently installed.)
Preparing to unpack .../python3-tz_2022.1-1ubuntu0.22.04.1_all.deb ...
Unpacking python3-tz (2022.1-1ubuntu0.22.04.1) ...
Setting up python3-tz (2022.1-1ubuntu0.22.04.1) ...

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ pip3 show pytz
Name: pytz
Version: 2022.1
Summary: World timezone definitions, modern and historical
Home-page: http://pythonhosted.org/pytz
Author: Stuart Bishop
Author-email: stuart@stuartbishop.net
License: MIT
Location: /usr/lib/python3/dist-packages
Requires:
Required-by:

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ cat /usr/lib/python3/dist-packages/pytz/__init__.py
'''
datetime.tzinfo timezone definitions generated from the
Olson timezone database:

    ftp://elsie.nci.nih.gov/pub/tz*.tar.gz

See the datetime section of the Python Library Reference for information
on how to use these modules.
'''

import sys
import datetime
import os.path
import pathlib
import re
import zoneinfo

from pytz.exceptions import AmbiguousTimeError
from pytz.exceptions import InvalidTimeError
from pytz.exceptions import NonExistentTimeError
from pytz.exceptions import UnknownTimeZoneError
from pytz.lazy import LazyDict, LazyList, LazySet  # noqa
from pytz.tzinfo import unpickler, BaseTzInfo
from pytz.tzfile import build_tzinfo

def _read_olson_version() -> str:
    tzdata_zi = pathlib.Path("/usr/share/zoneinfo/tzdata.zi")
    with tzdata_zi.open(encoding="utf-8") as tzdata_zi_file:
        line = tzdata_zi_file.readline()
    match = re.match("^#\s*version\s*([0-9a-z]*)\s*$", line)
    if match:
        return match.group(1)
    return "unknown"

# The IANA (nee Olson) database is updated several times a year.
OLSON_VERSION = _read_olson_version()
VERSION = '2022.1'  # pip compatible version number.
__version__ = VERSION

OLSEN_VERSION = OLSON_VERSION  # Old releases had this misspelling

Python/Pip pytz codebase

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ sudo pip3 show pytz
WARNING: Package(s) not found: pytz

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ pip3 show pytz
WARNING: Package(s) not found: pytz

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ python3 -m pip install pytz==2022.1
Defaulting to user installation because normal site-packages is not writeable
Collecting pytz==2022.1
  Downloading pytz-2022.1-py2.py3-none-any.whl (503 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 503.5/503.5 kB 6.1 MB/s eta 0:00:00
Installing collected packages: pytz
Successfully installed pytz-2022.1

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ pip3 show pytz
Name: pytz
Version: 2022.1
Summary: World timezone definitions, modern and historical
Home-page: http://pythonhosted.org/pytz
Author: Stuart Bishop
Author-email: stuart@stuartbishop.net
License: MIT
Location: /home/intel/.local/lib/python3.10/site-packages
Requires:
Required-by:

intel@intel-M50CYP2SBSTD:~/anjali/examples/scikit-learn-intelex$ cat /home/intel/.local/lib/python3.10/site-packages/pytz/__init__.py
'''
datetime.tzinfo timezone definitions generated from the
Olson timezone database:

    ftp://elsie.nci.nih.gov/pub/tz*.tar.gz

See the datetime section of the Python Library Reference for information
on how to use these modules.
'''

import sys
import datetime
import os.path

from pytz.exceptions import AmbiguousTimeError
from pytz.exceptions import InvalidTimeError
from pytz.exceptions import NonExistentTimeError
from pytz.exceptions import UnknownTimeZoneError
from pytz.lazy import LazyDict, LazyList, LazySet  # noqa
from pytz.tzinfo import unpickler, BaseTzInfo
from pytz.tzfile import build_tzinfo

# The IANA (nee Olson) database is updated several times a year.
OLSON_VERSION = '2022a'
VERSION = '2022.1'  # pip compatible version number.
__version__ = VERSION

OLSEN_VERSION = OLSON_VERSION  # Old releases had this misspelling
dimakuv commented 1 year ago

The solution is to mount /usr/share/zoneinfo into scikit-learn manifest

@anjalirai-intel Could you submit a PR that fixes it? You'll need to add it in two places in the manifest file: fs.mounts and sgx.trusted_files. And I mean specifying this whole directory /usr/share/zoneinfo/, not any specific files inside -- it's just easier to specify the whole path.

Also, if you'll do this, please add a comment in the manifest file why this is needed. Something like this is enough:

# Scikit imports pandas; newer versions of pandas rely on pytz which fails if it
# doesn't find /usr/share/zoneinfo/ files (this was found on pytz version 2022.1)
anjalirai-intel commented 12 months ago

Closing Issue. Fix got merged #70