Open strickvl opened 10 months ago
Hello @strickvl, I'm trying to reproduce this issue but can't. I made a GCS bucket and tried to run the first snippet and got the following error. Please let me know if you need the traceback.
ValueError: No file systems were found for the scheme: gs://. Please make sure that you are using the right path and the all the necessary integrations are properly installed.
The error was raised for the following line,
log_storage = StepLogsStorage(logs_uri=TEST_FILE, max_messages=5)
Here I'd patch in @bcdurak who I think was most involved with that particular part of the codebase. I think he should be able to help with this. Other things to check:
import gcsfs
fs = gcsfs.GCSFileSystem()
with fs.open('gs://your-bucket-name/test.txt', 'w') as f:
f.write('Hello, world!')
with fs.open('gs://your-bucket-name/test.txt', 'r') as f:
print(f.read())
(Replace 'gs://your-bucket-name/test.txt' with a valid path in your GCS bucket.)
Thank you for the code you provided. I did have some permission issues, which I resolved after trying this code, and the code provided correctly prints Hello, world!
. However, the previous error I got persists even now.
ValueError: No file systems were found for the scheme: gs://. Please make sure that you are using the right path and the all the necessary integrations are properly installed.
EDIT:
I think I understand the source of this error. I have attached the traceback below. The code uses fileio
to open the URI which raises error. Instead, at this step, gcsfs
needs to be used like in the previous code provided.
I think I see what's going on now. Are you running the code with a GCS artifact store configured in your ZenML stack? (fileio
will use whatever stack you have configured and set up for ZenML, so if you have a GCS artifact store then it should work).
I see. I tried to setup a GCS artifact store but am facing some errors. I don't understand a few steps and will first acquaint myself. Could you please assign me to this issue?
I was able to reproduce the issue. The output I get for the initial code is
I'm log line #10
I will now work on solving the issue.
@strickvl I have fixed the issue locally and I'm getting the expected output as shown below
However I'm facing an issue in following the Contributions guidelines. While running the command mypy --install-types
I get the error error: Can't determine which types to install with no files to check (and no cache from previous mypy run)
. Could you please help with this?
Also, while opening a pull request, I read this pre-requisite: I have added tests to cover my changes
. To fix the bug I made a change to src/zenml/logging/step_logging.py
. So I think I need to add tests, but I'm not sure how to do this. Request help on this.
For our cloud integrations, it's enough to demonstrate that you've tested it. We don't currently run integration tests on cloud environments, so basically for something like this it wouldn't be possible to test it locally. Icing on the cake would be to include instructions how someone from the core team could reproduce your local test (code snippet and reminder of what the stack setup would be) in the PR, but beyond that I think you're ok.
Also for mypy I think you can ignore that and just make the PR. Any issues will be revealed there.
Open Source Contributors Welcomed!
Please comment below if you would like to work on this issue!
Contact Details [Optional]
support@zenml.io
What happened?
There seems to be an issue with StepLogging when using GCS (Google Cloud Storage) as the artifact store. Specifically, only the last parts of the logs appear in the file, which suggests a problem with the log writing or saving mechanism.
Steps to Reproduce
Here's a snippet to reproduce the issue:
Expected Behavior
All log lines should be saved and visible in the GCS file, not just the last few.
Potential Solution
Consider using the logging.StreamHandler facility to temporarily write logs to the remote file (GCS, S3, etc.). Here's an example:
This approach could fit nicely in the
StepLogsStorageContext
class.Additional Context
Proper log handling is crucial for debugging and monitoring pipeline performance, especially when dealing with large-scale data processing in cloud environments.
Code of Conduct