Closed gabwon9 closed 1 year ago
Hey @GabwonPark! Thanks so much for opening up this issue.
In order to best diagnose the issue at hand, it would be really helpful to have the underlying configuration you're using. Would you mind providing your data context config, scripts, and any other pertinent information (please omit any sensitive details)?
We'll review that information and try to determine what's going on.
@cdkini I added script examples in description above. Please check Test Script Examples. Thank you.
@GabwonPark if you actually look in the bucket, are the doc sites being rewritten each time? How about the validation results?
Could you provide your configs for your stores (expectations, validations, and checkpoints)?
Just looking for some additional context so we can narrow down the issue!
@cdkini I updated great_expectaion.yml at 'Test Script Examples' in description. Is it enough ? The doc sites is rewritten each time. Validation results for current execution result are only maintained. In other words, old validation results is removed. I followed the guide below for v3 api. https://legacy.docs.greatexpectations.io/en/stable/guides/how_to_guides/configuring_data_docs/how_to_host_and_share_data_docs_on_gcs.html
@cdkini Is there any update ? or do you need any other data ?
Hey @GabwonPark! Apologies for the delayed response.
Our team is still reviewing the details you've provided. I believe this should be sufficient but I'll let you know if we need anything else when debugging the issue. Please note that our team is out for the remainder of the week so we'll address this early next week.
Thanks!
@cdkini Thank you for your reply!
@GabwonPark have you run the CLI docs build
or context.build_data_docs()
at all before running your script?
My initial hunch here is that the UpdateDataDocsAction is running in an environment where the docs are not built. Do you actually run the context.build_data_docs()
line in your script?
@cdkini No, i did not that. I don't use context.build_data_docs() to update the data docs per test result. As you know, i used UpdateDataDocsAction only. Should i use context.build_data_docs() before new test execution to protect already exist data docs ? And we can use slack also for more fast response.
@GabwonPark I would try initializing your docs sites with that build command and see if it makes a difference. I'll work to set up a GCS-based project so I can replicate your environment in the meanwhile. Let me know if things work out!
@cdkini Thank you for your supporting !
@GabwonPark sure thing! Did that happen to work?
@cdkini No effect. docs is still removed when starting test. As i think, the datas of 'uncommitted' in local computer seems be replaced to the data of gcp data docs instead of merging into gcp data docs.
@GabwonPark apologies for the delay. Could you please confirm that this is still an issue? Additionally, would you mind trying your script and overall approach on your local filesystem. I'm curious if the same overwriting happens in a different environment.
@cdkini this is still issue . And it is not issue on local file system. It's reproducible on gcp environment. Did you try to test on gcp environment?
@GabwonPark I'm not able to reproduce your issue unfortunately.
Could you try adding your gcs site name to this part of the config:
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
site_names: []
You may also have better success using batch requests to pass in variables (as opposed to the env var assignment you're using in your script).
Finally, if you check uncomitted/validations
, do you see anything? Is that empty, does it contain a single validation that's being overwritten, or something else?
I'm still looking into the matter but please ensure that the configuration you've provided me is still accurate. Thanks!
@cdkini Thank you for your reply. Did you test on gcloud environment or in your local laptop to reproduce this issue ? As i think, this issue can be reproduced on gcloud environment and it can't not be reproduced in local laptop as local site works well.
Additionally,
And happy new year !
Hi @GabwonPark - sorry for the delay on this!
We still haven't been able to reproduce this on our side. That said, another user recently reported a similar issue. For them, the issue was that one of their environments had a Docker volume being mounted to the image which had another GE config directory which was causing a conflict. Since there were two GE configs being used in the local environment it was causing the other to overwrite as it was being viewed as a new configuration. They were able to resolve this by removing the Docker volume that was creating a conflict with the config.
Does this sound like a viable approach for you as well? It would be great to hear if this could work for you.
Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity.
It will be closed if no further activity occurs. Thank you for your contributions 🙇
Describe the bug Data Docs are created in the Google Cloud Storage (bucket). The Data Docs get overwritten each time a new test is executed via Cronjob. i.e. the output will always contain one result.
To Reproduce Steps to reproduce the behavior:
Expected behavior The previous test result of Data Docs ramains.
Actual behavior Newly executed test result overwrites the Data Docs. As a result, previous test result is removed.
Environment (please complete the following information):
Additional context Expected behaviors can be seen if the test is executed locally but pointing to the GCS resources. i.e. Data Docs retain all executed test results.
Test Script Examples
great_expectation.yml
checkpoint_feedback.yml
test.py