Closed mhsong21 closed 1 year ago
TFX has its own cache, if all input artifact & exec properties (in this case, input is just the query) are the same, it will use last component execution's result. you can turn off TFX cache when creating pipeline object
@mhsong21,
You can set enable_cache: False
while calling while creating create pipeline object in my_pipeline.py
file as mentioned in above comment. Please refer Build a custom pipeline for more info on pipeline customisation.
Thank you!
Closing this due to inactivity. Please take a look into the answers provided above, feel free to reopen and post your comments(if you still have queries on this). Thank you!
If the bug is related to a specific library below, please raise an issue in the respective repo directly:
TensorFlow Data Validation Repo
TensorFlow Model Analysis Repo
TensorFlow Transform Repo
TensorFlow Serving Repo
System information
pip freeze
output):Describe the current behavior I got KeyError at google_cloud_big_query/utils.py#L70 while adding a new column to the existing table for ExampleGen. As I build the table by another query, the input query is the same as before. If you see the executor of the BigQueryExampleGen component, it uses the cached results to retrieve the schema of the query result. (I checked
query_job.cache_hit
field)Describe the expected behavior I want the executor to get the schema from the fresh result, not cached result. I expect the change would be simple and atomic, just setting cache option to false.
Standalone code to reproduce the issue
Providing a bare minimum test case or step(s) to reproduce the problem will greatly help us to debug the issue. If possible, please share a link to Colab/Jupyter/any notebook.
Name of your Organization (Optional)
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.