Open michshap opened 8 months ago
@michshap what happens when you run /dir1/dir2/mlruns
?
@harupy what do you mean? when I run
mlflow ui --host 0.0.0.0 --port 8084 --backend-store-uri /dir1/dir2/mlruns/
I see the runs and their metrics etc. as I should. The problem occurs when I try to log something as artifact, then I'll get a "permission denied" error of writing to /dir2/.... as /dir2 doesn't exist (it is /dir1/dir2). So basically I can't log any artifacts.
@michshap Could you upgrade MLflow version and try again? I can't reproduce this problem on latest master
@serena-ruan I upgraded and got the same behavior - Active run artifact URI: file://dir1/dir2/mlruns/0/d71476fa8cd94ca096473ea7dbc4e352/artifacts (only 2 '//') which then translates to trying to wrote the artifacts to "/dir2/mlruns/0/d71476fa8cd94ca096473ea7dbc4e352/artifacts" (skips the dir1/dir2).
in log_artifacts lin 552
def log_artifact(self, run_id, local_path, artifact_path=None):
"""
Write a local file or directory to the remote ``artifact_uri``.
Args:
local_path: Path to the file or directory to write.
artifact_path: If provided, the directory in ``artifact_uri`` to write to.
"""
artifact_repo = self._get_artifact_repo(run_id)
I see that artifact_repo.artifact_dir = /dir2/mlruns/0/d71476fa8cd94ca096473ea7dbc4e352/artifacts when debugging.
what do you see as active run artifact URI (active_run.info.artifact_uri)?
>>> artifact_repo.artifact_dir
'/Users/serena.ruan/Documents/test/mlruns/0/c913c72646c94fb5b2fd8bdc3ca1110e/artifacts'
>>> artifact_repo.artifact_uri
'file:///Users/serena.ruan/Documents/test/mlruns/0/c913c72646c94fb5b2fd8bdc3ca1110e/artifacts'
My code is as this:
>>> mlflow.set_tracking_uri("file:///Users/serena.ruan/Documents/test/mlruns")
>>> with mlflow.start_run():
... mlflow.log_artifact("/Users/serena.ruan/Documents/test/test.txt")
Are you running the same script? Providing stack trace and your original code should be helpful
@serena-ruan Thank you! so indeed something is off with the behavior I'm getting, that one of the "/" is dropped in the artifact_uri which leads to the entire first dir being dropped later...
when I run the exact code as you:
mlflow.set_tracking_uri("file:///home/my.name/mlruns")
with mlflow.start_run():
mlflow.log_artifact("/home/my.name/example.txt")
I get:
Cell In[2], line 3
1 mlflow.set_tracking_uri("file:///home/my.name/mlruns")
2 with mlflow.start_run():
----> 3 mlflow.log_artifact("/home/my.name/example.txt")
File ~/venvs/my_dev5/lib/python3.10/site-packages/mlflow/tracking/fluent.py:1057, in log_artifact(local_path, artifact_path, run_id)
1029 """
1030 Log a local file or directory as an artifact of the currently active run. If no run is
1031 active, this method will create a new active run.
(...)
1054 mlflow.log_artifact(path)
1055 """
1056 run_id = run_id or _get_or_start_run().info.run_id
-> 1057 MlflowClient().log_artifact(run_id, local_path, artifact_path)
File ~/venvs/my_dev5/lib/python3.10/site-packages/mlflow/tracking/client.py:1189, in MlflowClient.log_artifact(self, run_id, local_path, artifact_path)
1150 def log_artifact(self, run_id, local_path, artifact_path=None) -> None:
1151 """Write a local file or directory to the remote ``artifact_uri``.
1152
1153 Args:
(...)
1187
1188 """
-> 1189 self._tracking_client.log_artifact(run_id, local_path, artifact_path)
File ~/venvs/my_dev5/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py:560, in TrackingServiceClient.log_artifact(self, run_id, local_path, artifact_path)
558 artifact_repo.log_artifacts(local_path, path_name)
559 else:
--> 560 artifact_repo.log_artifact(local_path, artifact_path)
File ~/venvs/my_dev5/lib/python3.10/site-packages/mlflow/store/artifact/local_artifact_repo.py:37, in LocalArtifactRepository.log_artifact(self, local_file, artifact_path)
33 artifact_dir = (
34 os.path.join(self.artifact_dir, artifact_path) if artifact_path else self.artifact_dir
35 )
36 if not os.path.exists(artifact_dir):
---> 37 mkdir(artifact_dir)
38 try:
39 shutil.copy2(local_file, os.path.join(artifact_dir, os.path.basename(local_file)))
File ~/venvs/my_dev5/lib/python3.10/site-packages/mlflow/utils/file_utils.py:212, in mkdir(root, name)
210 except OSError as e:
211 if e.errno != errno.EEXIST or not os.path.isdir(target):
--> 212 raise e
213 return target
File ~/venvs/my_dev5/lib/python3.10/site-packages/mlflow/utils/file_utils.py:209, in mkdir(root, name)
207 target = os.path.join(root, name) if name is not None else root
208 try:
--> 209 os.makedirs(target)
210 except OSError as e:
211 if e.errno != errno.EEXIST or not os.path.isdir(target):
File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
213 if head and tail and not path.exists(head):
214 try:
--> 215 makedirs(head, exist_ok=exist_ok)
216 except FileExistsError:
217 # Defeats race condition when another thread created the path
218 pass
File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
213 if head and tail and not path.exists(head):
214 try:
--> 215 makedirs(head, exist_ok=exist_ok)
216 except FileExistsError:
217 # Defeats race condition when another thread created the path
218 pass
[... skipping similar frames: makedirs at line 215 (1 times)]
File /usr/lib/python3.10/os.py:215, in makedirs(name, mode, exist_ok)
213 if head and tail and not path.exists(head):
214 try:
--> 215 makedirs(head, exist_ok=exist_ok)
216 except FileExistsError:
217 # Defeats race condition when another thread created the path
218 pass
File /usr/lib/python3.10/os.py:225, in makedirs(name, mode, exist_ok)
223 return
224 try:
--> 225 mkdir(name, mode)
226 except OSError:
227 # Cannot rely on checking for EEXIST, since the operating system
228 # could give priority to other errors like EACCES or EROFS
229 if not exist_ok or not path.isdir(name):
PermissionError: [Errno 13] Permission denied: '/my.name'
[which makes sense because there isn't such location] and as I wrote:
>>> artifact_repo.artifact_dir
'/my.name/mlruns/0/b3a48ff5b8ae422fa21a5a08d00212d6/artifacts'
>>> artifact_repo.artifact_uri
'file://home/my.name/mlruns/0/b3a48ff5b8ae422fa21a5a08d00212d6/artifacts'
@michshap There probably something wrong with your local file directories. Could you try the same code snippet in a new location (where no previous experiments exist)?
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.
Issues Policy acknowledgement
Where did you encounter this bug?
Local machine
Willingness to contribute
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
MLflow version
System information
Describe the problem
I use mlflow in the following way:
and everything works fine. however, when I try to use log_artifact, I see that it is trying to be saved at '/dir2/mlruns/0/...' meaning the /dir1/ was somehow dropped.
when debugging log_artifact, I see that [from log_artifact code]:
returns: artifact_repo.artifact_dir = '/dir2/mlruns/0/...' artifact_repo.artifact_uri = 'file://dir1/dir2/mlruns/0...' - I'd guess it is supposed to be ''file:///dir1/dir2/mlruns/0...' as it is in the tracking_uri (3 '/' instead of 2 '/')?
How do I fix that? what causes it?
PS - I also tried
and it gives the same error.
Tracking information
Code to reproduce issue
Stack trace
no error (except from saving in a location that doesn't exists)
Other info / logs
What component(s) does this bug affect?
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingWhat interface(s) does this bug affect?
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportWhat language(s) does this bug affect?
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrations