zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
4.04k stars 436 forks source link

[BUG]: outside of artifact store bounds #2986

Open Aadik1ng opened 1 month ago

Aadik1ng commented 1 month ago

System Information

SYSTEM_INFO: {'os': 'windows', 'windows_version_release': '10', 'windows_version': '10.0.19045', 'windows_version_service_pack': 'SP0', 'windows_version_os_type': 'Multiprocessor Free'}

What happened?

Initiating a new run for the pipeline: data_ingestion_pipeline. Executing a new run. Caching is disabled by default for data_ingestion_pipeline. Using user: default Using stack: stack_1 orchestrator: default artifact_store: local_artifact_store You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml up.

Relevant log output

Initiating a new run for the pipeline: data_ingestion_pipeline.
Executing a new run.
Caching is disabled by default for data_ingestion_pipeline.
Using user: default
Using stack: stack_1
  orchestrator: default
  artifact_store: local_artifact_store
You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml up.
Failed to execute data ingestion pipeline: File `D:data\artifacts\data_ingestion_step\logs` is outside of artifact store bounds `data/artifacts`

CURRENT STACK

Name: stack_1 ID: d1765057-c27f-4e5e-ac3e-e7cee6a45797 User: default / a8bf9397-512c-4f9a-9266-f24b3ea10921 Workspace: default / 5e1c6e98-5302-47d4-bbc7-b97d24d1def3

ORCHESTRATOR: default

Name: default ID: 33520fcf-456f-438d-b894-a765938a6b5e Type: orchestrator Flavor: local Configuration: {} Workspace: default / 5e1c6e98-5302-47d4-bbc7-b97d24d1def3

ARTIFACT_STORE: local_artifact_store

Name: local_artifact_store ID: abae7772-933e-40fb-8893-0eb1bd855a9e Type: artifact_store Flavor: local Configuration: {'path': 'data/artifacts'} User: default / a8bf9397-512c-4f9a-9266-f24b3ea10921 Workspace: default / 5e1c6e98-5302-47d4-bbc7-b97d24d1def3

Reproduction steps

@step def data_ingestion_step(config_path: str) -> Annotated[Dict[str, str], "paths"]: try:

Load configuration

    config = load_config(config_path)
    pdf_path = config['balance_sheet_pdf']

    # Use ZenML's artifact store context to determine base paths
    context = get_step_context()
    base_dir = context.get_output_artifact_uri()

    # Construct directories within the artifact store
    # Construct directories within the artifact store
    image_dir = os.path.join(base_dir, config.get('image_dir', 'data/artifacts/images'))
    table_dir = os.path.join(base_dir, config.get('table_dir', 'data/artifacts/tables'))
    other_dir = os.path.join(base_dir, config.get('other_dir', 'data/artifacts/other_content'))

    os.makedirs(image_dir, exist_ok=True)
    os.makedirs(table_dir, exist_ok=True)
    os.makedirs(other_dir, exist_ok=True)

    # Set up logging within the artifact store
    log_file_path = os.path.join(base_dir, 'logs', 'data_ingestion.log')
    os.makedirs(os.path.dirname(log_file_path), exist_ok=True)
    logging.basicConfig(
        filename=log_file_path,
        filemode='a',
        level=logging.INFO,
        format='%(asctime)s - %(levelname)s - %(message)s'
    )
    logger = logging.getLogger(__name__)

    logger.info(f"Image directory: {image_dir}")
    logger.info(f"Table directory: {table_dir}")
    logger.info(f"Other content directory: {other_dir}")

    # Extract data from PDF
    extract_from_pdf(pdf_path, image_dir, table_dir, other_dir)

    # Log extraction info
    log_extraction_info(pdf_path, image_dir, table_dir, other_dir)

    # Return paths as a dictionary
    return {
        "image_dir": image_dir,
        "table_dir": table_dir,
        "other_dir": other_dir
    }
except Exception as e:
    logger.error(f"Error in data ingestion step: {e}")
    raise

@pipeline def data_ingestion_pipeline(config_path: str): data_ingestion_step(config_path=config_path)

if name == "main": config_path = 'config.yml' # Adjust this path if needed try:

Run the pipeline

    data_ingestion_pipeline = data_ingestion_pipeline.with_options(enable_cache=False)
    data_ingestion_pipeline(config_path=config_path)
except Exception as e:
    print(f"Failed to execute data ingestion pipeline: {e}")

Code of Conduct

schustmi commented 1 month ago

This seems like an issue with Windows paths. Can you just quickly verify whether the following code also fails for you on your local machine please:

from zenml import pipeline, step

@step
def logging_step() -> None:
  print("Some log message")

@pipeline
def p():
  logging_step()

if __name__ == "__main__":
  p()
Aadik1ng commented 1 month ago

FileNotFoundError: File D:\data\artifacts\logging_step\logs is outside of artifact store bounds data/artifacts PS D:> & C:/Users/aadit/AppData/Local/Microsoft/WindowsApps/python3.11.exe D:\src\rough.py Initiating a new run for the pipeline: p. Executing a new run. Using user: default Using stack: stack_1 orchestrator: default artifact_store: local_artifact_store You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml up. ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ D:\src\rough.py:12 in │ │ │ │ 9 logging_step() │ │ 10 │ │ 11 if name == "main": │ │ ❱ 12 p() │ │ 13 │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\new\pipelines\pipeline.py:1382 in call │ │ │ │ 1379 │ │ │ return self.entrypoint(*args, kwargs) │ │ 1380 │ │ │ │ 1381 │ │ self.prepare(args, kwargs) │ │ ❱ 1382 │ │ return self._run(self._run_args) │ │ 1383 │ │ │ 1384 │ def _call_entrypoint(self, args: Any, kwargs: Any) -> None: │ │ 1385 │ │ """Calls the pipeline entrypoint function with the given arguments. │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\new\pipelines\pipeline.py:771 in _run │ │ │ │ 768 │ │ │ │ │ │ "zenml up." │ │ 769 │ │ │ │ │ ) │ │ 770 │ │ │ │ │ ❱ 771 │ │ │ deploy_pipeline( │ │ 772 │ │ │ │ deployment=deployment_model, stack=stack, placeholder_run=run │ │ 773 │ │ │ ) │ │ 774 │ │ │ if run: │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\new\pipelines\run_utils.py:153 in │ │ deploy_pipeline │ │ │ │ 150 │ │ │ # placeholder run to stay in the database │ │ 151 │ │ │ Client().delete_pipeline_run(placeholder_run.id) │ │ 152 │ │ │ │ ❱ 153 │ │ raise e │ │ 154 │ finally: │ │ 155 │ │ constants.SHOULD_PREVENT_PIPELINE_EXECUTION = previous_value │ │ 156 │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\new\pipelines\run_utils.py:141 in │ │ deploy_pipeline │ │ │ │ 138 │ previous_value = constants.SHOULD_PREVENT_PIPELINE_EXECUTION │ │ 139 │ constants.SHOULD_PREVENT_PIPELINE_EXECUTION = True │ │ 140 │ try: │ │ ❱ 141 │ │ stack.deploy_pipeline(deployment=deployment) │ │ 142 │ except Exception as e: │ │ 143 │ │ if ( │ │ 144 │ │ │ placeholder_run │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\stack\stack.py:853 in deploy_pipeline │ │ │ │ 850 │ │ Returns: │ │ 851 │ │ │ The return value of the call to orchestrator.run_pipeline(...). │ │ 852 │ │ """ │ │ ❱ 853 │ │ return self.orchestrator.run(deployment=deployment, stack=self) │ │ 854 │ │ │ 855 │ def _get_active_components_for_step( │ │ 856 │ │ self, step_config: "StepConfiguration" │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\orchestrators\base_orchestrator.py:187 in run │ │ │ │ 184 │ │ environment = get_config_environment_vars(deployment=deployment) │ │ 185 │ │ │ │ 186 │ │ try: │ │ ❱ 187 │ │ │ result = self.prepare_or_run_pipeline( │ │ 188 │ │ │ │ deployment=deployment, stack=stack, environment=environment │ │ 189 │ │ │ ) │ │ 190 │ │ finally: │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\orchestrators\local\local_orchestrator.py:78 in │ │ prepare_or_run_pipeline │ │ │ │ 75 │ │ │ │ │ step_name, │ │ 76 │ │ │ │ ) │ │ 77 │ │ │ │ │ ❱ 78 │ │ │ self.run_step( │ │ 79 │ │ │ │ step=step, │ │ 80 │ │ │ ) │ │ 81 │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\orchestrators\base_orchestrator.py:207 in │ │ run_step │ │ │ │ 204 │ │ │ step=step, │ │ 205 │ │ │ orchestrator_run_id=self.get_orchestrator_run_id(), │ │ 206 │ │ ) │ │ ❱ 207 │ │ launcher.launch() │ │ 208 │ │ │ 209 │ @staticmethod │ │ 210 │ def requires_resources_in_orchestration_environment( │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\orchestrators\step_launcher.py:164 in launch │ │ │ │ 161 │ │ │ │ 162 │ │ if step_logging_enabled: │ │ 163 │ │ │ # Configure the logs │ │ ❱ 164 │ │ │ logs_uri = step_logging.prepare_logs_uri( │ │ 165 │ │ │ │ self._stack.artifact_store, │ │ 166 │ │ │ │ self._step.config.name, │ │ 167 │ │ │ ) │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\logging\step_logging.py:87 in prepare_logs_uri │ │ │ │ 84 │ ) │ │ 85 │ │ │ 86 │ # Create the dir │ │ ❱ 87 │ if not artifact_store.exists(logs_base_uri): │ │ 88 │ │ artifact_store.makedirs(logs_base_uri) │ │ 89 │ │ │ 90 │ # Delete the file if it already exists │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\artifact_stores\base_artifact_store.py:151 in │ │ call │ │ │ │ 148 │ │ has_self = bool(args and isinstance(args[0], BaseArtifactStore)) │ │ 149 │ │ │ │ 150 │ │ # sanitize inputs for relevant args and kwargs, keep rest unchanged │ │ ❱ 151 │ │ args = tuple( │ │ 152 │ │ │ self._sanitize_potential_path( │ │ 153 │ │ │ │ arg, │ │ 154 │ │ │ ) │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\artifact_stores\base_artifact_store.py:152 in │ │ │ │ │ │ 149 │ │ │ │ 150 │ │ # sanitize inputs for relevant args and kwargs, keep rest unchanged │ │ 151 │ │ args = tuple( │ │ ❱ 152 │ │ │ self._sanitize_potential_path( │ │ 153 │ │ │ │ arg, │ │ 154 │ │ │ ) │ │ 155 │ │ │ if i + has_self in self.path_args │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\artifact_stores\base_artifact_store.py:133 in │ │ _sanitize_potential_path │ │ │ │ 130 │ │ │ path = path.replace(ntpath.sep, posixpath.sep) │ │ 131 │ │ │ self._validate_path(path) │ │ 132 │ │ else: │ │ ❱ 133 │ │ │ self._validate_path(str(Path(path).absolute().resolve())) │ │ 134 │ │ │ │ 135 │ │ return path │ │ 136 │ │ │ │ C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCa │ │ che\local-packages\Python311\site-packages\zenml\artifact_stores\base_artifact_store.py:98 in │ │ _validate_path │ │ │ │ 95 │ │ │ │ bounds. │ │ 96 │ │ """ │ │ 97 │ │ if not path.startswith(self.fixed_root_path): │ │ ❱ 98 │ │ │ raise FileNotFoundError( │ │ 99 │ │ │ │ f"File {path} is outside of " │ │ 100 │ │ │ │ f"artifact store bounds {self.fixed_root_path}" │ │ 101 │ │ │ ) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ FileNotFoundError: File D:\data\artifacts\logging_step\logs is outside of artifact store bounds data/artifacts

Aadik1ng commented 1 month ago

These are my environment variable

HOMEDRIVE: C: HOMEPATH: \Users\aadit LOCALAPPDATA: C:\Users\aadit\AppData\Local LOGONSERVER: \DESKTOP-C4QBLHV NUMBER_OF_PROCESSORS: 16 ORIGINAL_XDG_CURRENT_DESKTOP: undefined OS: Windows_NT PATH: C:\Users\aadit\AppData\Local\zenml; PATHEXT: .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL

ZENML_HOME: C:\Users\aadit\AppData\Local\zenml ZES_ENABLE_SYSMAN: 1

schustmi commented 1 month ago

Are you using the default local artifact store? Or did you register a custom one?

Aadik1ng commented 1 month ago

I was using custom in my previous stack but everything is on default now Initiating a new run for the pipeline: simple_ml_pipeline. Executing a new run. Using user: default
Using stack: default
orchestrator: default
artifact_store: default

Aadik1ng commented 1 month ago

(PS) D:\Dir> zenml artifact-store describe default Artifact_Store 'default' of flavor 'local' with id 'd6305633-a8f6-45ea-87f6-ae469da61fcf' is owned by user '-'. No configuration options are set for this component. No labels are set for this component. No connector is set for this component.

so the default artifact-store was not set properly so i change it to this basically created a new artifact-store called balance with dir/data/artifact as path

(PS)D:\Dir> zenml artifact-store describe balance Artifact_Store 'Balance' of flavor 'local' with id 'bc735317-765d-4e0a-99c7-dbf0728f4702' is owned by user 'default'. 'Balance' ARTIFACT_STORE Component
Configuration (ACTIVE) ┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓ ┃ COMPONENT_PROPERTY │ VALUE ┃ ┠────────────────────┼────────────────┨ ┃ PATH │ data/artifacts ┃ ┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛ No labels are set for this component. No connector is set for this component.

schustmi commented 1 month ago

What do you mean by "the default artifact store was no set properly"? You ran into some issues when running pipelines with the default artifact store?

I think with your custom artifact store, it's somehow messing up the volumes. Did you explicitly register it with D:\data\artifacts?

Aadik1ng commented 1 month ago

when i was using default artifact store i was getting the same issue but the path were

Failed to execute data ingestion pipeline: File C:\Users\aadit\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\Roaming\zenml\local_stores\5b773d37-c170-4a39-b1f5-d6a20fec3d5b\data_ingestion_step\logs is outside of artifact store bounds C:\Users\aadit\AppData\Roaming\zenml\local_stores\5b773d37-c170-4a39-b1f5-d6a20fec3d5b

so i change the artifact_store to custom one, updated its path to data/artifacts. I got the same issue but this time the paths were

D:\data\artifacts..

PS D:\RAG-on-Balance_Sheet> zenml artifact-store register balance --flavor=local You are configuring a stack component that is using local resources while connected to a remote ZenML server. The stack component may not be usable from other hosts or by other users. You should consider using a non-local stack component alternative instead. Successfully registered artifact_store balance.