sdv-dev / SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.
https://docs.sdv.dev/sdmetrics
MIT License
204 stars 44 forks source link

How can we set environment variables for 'store_path' to redirect .local writes? #637

Open SundareshSankaran opened 2 hours ago

SundareshSankaran commented 2 hours ago

Environment Details

Please indicate the following details about the environment in which you found the bug:

Error Description

I run sdv in a Python environment which is containerised and accessible via Kubernetes. The restricted nature of K8s disallows me from creating any directories in the root file system. Upon import of sdv, I noticed that attempts are made to create .local in the root directory of the pod from where I run Python, and hence I obtain an error traced to :

>>>
Traceback (most recent call last):
  File "/python/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.local/share/sdv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/python/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.local/share'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
  File "<stdin>", line 2, in <module>
  File "<string>", line 2, in <module>
  File "/python/lib/python3.10/site-packages/sdv/__init__.py", line 18, in <module>
    from sdv import (
  File "/python/lib/python3.10/site-packages/sdv/data_processing/__init__.py", line 3, in 
<module>
    from sdv.data_processing.data_processor import DataProcessor
  File "/python/lib/python3.10/site-packages/sdv/data_processing/data_processor.py", line 28, in 
<module>
    from sdv.metadata.single_table import SingleTableMetadata
  File "/python/lib/python3.10/site-packages/sdv/metadata/__init__.py", line 5, in <module>
    from sdv.metadata.multi_table import MultiTableMetadata
  File "/python/lib/python3.10/site-packages/sdv/metadata/multi_table.py", line 18, in <module>
    from sdv.metadata.single_table import SingleTableMetadata
  File "/python/lib/python3.10/site-packages/sdv/metadata/single_table.py", line 37, in <module>
    SINGLETABLEMETADATA_LOGGER = get_sdv_logger('SingleTableMetadata')
  File "/python/lib/python3.10/site-packages/sdv/logging/logger.py", line 62, in get_sdv_logger
    logger_conf = get_sdv_logger_config()
  File "/python/lib/python3.10/site-packages/sdv/logging/utils.py", line 16, in 
get_sdv_logger_config
    store_path.mkdir(parents=True, exist_ok=True)
  File "/python/lib/python3.10/pathlib.py", line 1179, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/python/lib/python3.10/pathlib.py", line 1179, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/python/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
OSError: [Errno 30] Read-only file system: '/.local'

Steps to reproduce

The steps which got me the error above is simple.

import sdv

However, I traced this back to the different classes called by init and found the offending line was the setting of the store_path variable:

/python3.10/site-packages/sdv/logging/utils.py

83   from pathlib import Path
84   import platformdirs
85   
86   store_path = Path(platformdirs.user_data_dir('sdv', 'sdv-dev'))
87   
88   print(store_path)

>>>
**/.local/share/sdv**

Note that this is in the root file system, which gives me the error.

Therefore, is there any way to set this path through an environment variable. I can manually edit the above, I suppose, but don't want to touch your inner code.

Thanks.

SundareshSankaran commented 2 hours ago

TEMPORARY: I've solved this using the XDG Data Specifications and identified the environment variable to use (XDG_DATA_HOME). Curious if I missed this in any of your documentation, though, so leaving this hanging around.