tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.org/tfx
Apache License 2.0
2.11k stars 704 forks source link

enable_cache not working with Transform component when importing schema with ImportNode #1961

Closed jason-brian-anderson closed 3 years ago

jason-brian-anderson commented 4 years ago

It appears that the enable_cache interface to tfx.componenets.Transform is not effective when. using tfx.components.ImportNode to load a curated schema.

   # Import version-controlled schema
    schema_importer = tfx.components.ImporterNode(
        instance_name='import_schema',
        # reimport=True,
        source_uri=schema_root,  # directory containing only schema text proto
        artifact_type=tfx.types.standard_artifacts.Schema,
    )

transform looks like:

    # Performs transformations and feature engineering in training and serving.
    transform = tfx.components.Transform(
        examples=example_gen.outputs['examples'],
        schema=schema_importer.outputs['result'],
        module_file=module_file,
        enable_cache=_transform_caching,
    )

The transformer is clearly processing through examples as it can go from 15 minutes to 4 hours depending on the data i feed it. Additionally, I searched the output for 'skip' as was recommended in a previous related issue, nothing found. output of transformer in airflow is:


*** Reading local file: /usr/local/airflow/logs/mlops_prod_web_request_classifier_v1/Transform/2020-06-10T14:51:34.026725+00:00/1.log
[2020-06-10 14:53:51,230] {{taskinstance.py:655}} INFO - Dependencies all met for <TaskInstance: mlops_prod_web_request_classifier_v1.Transform 2020-06-10T14:51:34.026725+00:00 [queued]>
[2020-06-10 14:53:55,263] {{taskinstance.py:655}} INFO - Dependencies all met for <TaskInstance: mlops_prod_web_request_classifier_v1.Transform 2020-06-10T14:51:34.026725+00:00 [queued]>
[2020-06-10 14:53:55,263] {{taskinstance.py:866}} INFO - 
--------------------------------------------------------------------------------
[2020-06-10 14:53:55,263] {{taskinstance.py:867}} INFO - Starting attempt 1 of 1
[2020-06-10 14:53:55,263] {{taskinstance.py:868}} INFO - 
--------------------------------------------------------------------------------
[2020-06-10 14:53:59,287] {{taskinstance.py:887}} INFO - Executing <Task(AirflowComponent): Transform> on 2020-06-10T14:51:34.026725+00:00
[2020-06-10 14:53:59,301] {{standard_task_runner.py:53}} INFO - Started process 32759 to run task
[2020-06-10 14:54:07,455] {{logging_mixin.py:112}} INFO - Running %s on host %s <TaskInstance: mlops_prod_web_request_classifier_v1.Transform 2020-06-10T14:51:34.026725+00:00 [running]> e9252458d5c2
[2020-06-10 14:54:15,513] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:15,512] {{base_component_launcher.py:195}} INFO - Running driver for Transform
[2020-06-10 14:54:19,533] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:19,532] {{metadata_store.py:65}} INFO - MetadataStore with DB connection initialized
[2020-06-10 14:54:19,606] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:19,605] {{artifact_utils.py:191}} WARNING - Could not find matching artifact class for type 'Examples' (proto: 'id: 6\nname: "Examples"\nproperties {\n  key: "span"\n  value: INT\n}\nproperties {\n  key: "split_names"\n  value: STRING\n}\n'); generating an ephemeral artifact class on-the-fly. If this is not intended, please make sure that the artifact class for this type can be imported within your container or environment where a component is executed to consume this type.
[2020-06-10 14:54:19,648] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:19,647] {{artifact_utils.py:191}} WARNING - Could not find matching artifact class for type 'Schema' (proto: 'id: 10\nname: "Schema"\n'); generating an ephemeral artifact class on-the-fly. If this is not intended, please make sure that the artifact class for this type can be imported within your container or environment where a component is executed to consume this type.
[2020-06-10 14:54:24,370] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:24,370] {{base_component_launcher.py:201}} INFO - Running executor for Transform
[2020-06-10 14:54:24,378] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:24,378] {{dependency_utils.py:68}} INFO - Attempting to infer TFX Python dependency for beam
[2020-06-10 14:54:24,379] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:24,379] {{dependency_utils.py:102}} INFO - Generating a temp requirements.txt file at /tmp/tmp09p65gnw/requirement.txt
[2020-06-10 14:54:24,380] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:24,380] {{dependency_utils.py:77}} INFO - Added --requirements_file=/tmp/tmp09p65gnw/requirement.txt to beam args
[2020-06-10 14:54:24,427] {{logging_mixin.py:112}} WARNING - From /usr/local/airflow/.local/lib/python3.7/site-packages/tfx/components/transform/executor.py:511: Schema (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version.
Instructions for updating:
Schema is a deprecated, use schema_utils.schema_from_feature_spec to create a `Schema`
[2020-06-10 14:54:24,543] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:24,542] {{base_executor.py:131}} INFO - Using 1 process(es) for Beam pipeline execution.
[2020-06-10 14:54:24,543] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:24,543] {{pipeline.py:143}} INFO - Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner.
[2020-06-10 14:54:24,602] {{logging_mixin.py:112}} WARNING - Tensorflow version (2.1.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
[2020-06-10 14:54:25,720] {{logging_mixin.py:112}} WARNING - From /usr/local/airflow/.local/lib/python3.7/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
[2020-06-10 14:54:25,726] {{logging_mixin.py:112}} WARNING - Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
[2020-06-10 14:54:25,944] {{logging_mixin.py:112}} WARNING - Issue encountered when serializing tft_mapper_use.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'Counter' object has no attribute 'name'
[2020-06-10 14:54:26,020] {{logging_mixin.py:112}} WARNING - Tensorflow version (2.1.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
[2020-06-10 14:54:26,060] {{logging_mixin.py:112}} WARNING - Tensorflow version (2.1.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended.
[2020-06-10 14:54:27,344] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,344] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function annotate_downstream_side_inputs at 0x7fe4173b0a70> ====================
[2020-06-10 14:54:27,347] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,347] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function fix_side_input_pcoll_coders at 0x7fe4173b0b90> ====================
[2020-06-10 14:54:27,348] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,348] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function lift_combiners at 0x7fe4173b0c20> ====================
[2020-06-10 14:54:27,350] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,350] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function expand_sdf at 0x7fe4173b0cb0> ====================
[2020-06-10 14:54:27,351] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,351] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function expand_gbk at 0x7fe4173b0d40> ====================
[2020-06-10 14:54:27,352] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,352] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function sink_flattens at 0x7fe4173b0e60> ====================
[2020-06-10 14:54:27,353] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,353] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function greedily_fuse at 0x7fe4173b0ef0> ====================
[2020-06-10 14:54:27,359] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,359] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function read_to_impulse at 0x7fe4173b0f80> ====================
[2020-06-10 14:54:27,360] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,360] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function impulse_to_input at 0x7fe4173b1050> ====================
[2020-06-10 14:54:27,361] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,361] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function inject_timer_pcollections at 0x7fe4173b1200> ====================
[2020-06-10 14:54:27,362] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,362] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function sort_stages at 0x7fe4173b1290> ====================
[2020-06-10 14:54:27,362] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,362] {{fn_api_runner_transforms.py:488}} INFO - ==================== <function window_pcollection_coders at 0x7fe4173b1320> ====================
[2020-06-10 14:54:27,368] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,368] {{statecache.py:44}} INFO - Creating state cache with size 100
[2020-06-10 14:54:27,369] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,368] {{fn_api_runner.py:1528}} INFO - Created Worker handler <apache_beam.runners.portability.fn_api_runner.EmbeddedWorkerHandler object at 0x7fe4085f3d10> for environment urn: "beam:env:embedded_python:v1"

[2020-06-10 14:54:27,369] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,369] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_Analyze/CreateSavedModelForAnalyzerInputs[Phase0]/BindTensors/CreateSavedModel/Read_22)+(ref_PCollection_PCollection_9/Write)
[2020-06-10 14:54:27,380] {{logging_mixin.py:112}} INFO - [2020-06-10 14:54:27,379] {{fn_api_runner.py:672}} INFO - Running (((ref_AppliedPTransform_ReadDataset[AnalysisIndex0]/Read/Read_9)+(ref_AppliedPTransform_ReadDataset[AnalysisIndex0]/AddKey_10))+(ref_AppliedPTransform_Decode[AnalysisIndex0]/ApplyDecodeFn_12))+(FlattenAnalysisDatasetsBecauseItIsRequired/Write/0)
[2020-06-10 14:55:00,423] {{logging_mixin.py:112}} INFO - [2020-06-10 14:55:00,422] {{fn_api_runner.py:672}} INFO - Running (((((((FlattenAnalysisDatasetsBecauseItIsRequired/Read)+(ref_AppliedPTransform_Analyze/ApplySavedModel[Phase0]/BatchInputs/BatchElements/ParDo(_GlobalWindowsBatchingDoFn)_30))+(ref_AppliedPTransform_Analyze/ApplySavedModel[Phase0]/ApplySavedModel_31))+(ref_AppliedPTransform_Analyze/TensorSource[split_and_apply_vocab_2/vocabulary]/ExtractKeys_33))+(ref_AppliedPTransform_Analyze/VocabularyAccumulate[split_and_apply_vocab_2/vocabulary]/FlattenTokensAndMaybeWeightsLabels_35))+(ref_AppliedPTransform_Analyze/VocabularyAccumulate[split_and_apply_vocab_2/vocabulary]/CountPerToken/CountPerToken:PairWithVoid_37))+(Analyze/VocabularyAccumulate[split_and_apply_vocab_2/vocabulary]/CountPerToken/CombinePerKey(CountCombineFn)/Precombine))+(Analyze/VocabularyAccumulate[split_and_apply_vocab_2/vocabulary]/CountPerToken/CombinePerKey(CountCombineFn)/Group/Write)
[2020-06-10 15:05:06,902] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:06,901] {{fn_api_runner.py:672}} INFO - Running (((((Analyze/VocabularyAccumulate[split_and_apply_vocab_2/vocabulary]/CountPerToken/CombinePerKey(CountCombineFn)/Group/Read)+(Analyze/VocabularyAccumulate[split_and_apply_vocab_2/vocabulary]/CountPerToken/CombinePerKey(CountCombineFn)/Merge))+(Analyze/VocabularyAccumulate[split_and_apply_vocab_2/vocabulary]/CountPerToken/CombinePerKey(CountCombineFn)/ExtractOutputs))+(ref_AppliedPTransform_Analyze/VocabularyAccumulate[split_and_apply_vocab_2/vocabulary]/FilterProblematicStrings_45))+(Analyze/VocabularyMerge[split_and_apply_vocab_2/vocabulary]/CountPerToken/Precombine))+(Analyze/VocabularyMerge[split_and_apply_vocab_2/vocabulary]/CountPerToken/Group/Write)
[2020-06-10 15:05:10,373] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:10,373] {{fn_api_runner.py:672}} INFO - Running (((((((((Analyze/VocabularyMerge[split_and_apply_vocab_2/vocabulary]/CountPerToken/Group/Read)+(Analyze/VocabularyMerge[split_and_apply_vocab_2/vocabulary]/CountPerToken/Merge))+(Analyze/VocabularyMerge[split_and_apply_vocab_2/vocabulary]/CountPerToken/ExtractOutputs))+(ref_AppliedPTransform_Analyze/VocabularyMerge[split_and_apply_vocab_2/vocabulary]/SwapTokensAndCounts_54))+(ref_AppliedPTransform_Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/TotalVocabSize/CombineGlobally(CountCombineFn)/KeyWithVoid_58))+(ref_AppliedPTransform_Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/ParDo(_TopPerBundle)_76))+(Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/TotalVocabSize/CombineGlobally(CountCombineFn)/CombinePerKey/Precombine))+(Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/TotalVocabSize/CombineGlobally(CountCombineFn)/CombinePerKey/Group/Write))+(Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/Flatten/Transcode/0))+(Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/Flatten/Write/0)
[2020-06-10 15:05:12,481] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,481] {{fn_api_runner.py:672}} INFO - Running ((((Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/TotalVocabSize/CombineGlobally(CountCombineFn)/CombinePerKey/Group/Read)+(Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/TotalVocabSize/CombineGlobally(CountCombineFn)/CombinePerKey/Merge))+(Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/TotalVocabSize/CombineGlobally(CountCombineFn)/CombinePerKey/ExtractOutputs))+(ref_AppliedPTransform_Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/TotalVocabSize/CombineGlobally(CountCombineFn)/UnKey_66))+(ref_PCollection_PCollection_32/Write)
[2020-06-10 15:05:12,510] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,510] {{fn_api_runner.py:672}} INFO - Running ((((ref_AppliedPTransform_Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/TotalVocabSize/CombineGlobally(CountCombineFn)/DoOnce/Read_68)+(ref_AppliedPTransform_Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/TotalVocabSize/CombineGlobally(CountCombineFn)/InjectDefault_69))+(ref_AppliedPTransform_Analyze/VocabularyCount[split_and_apply_vocab_2/vocabulary]/ToInt64_70))+(ref_AppliedPTransform_Analyze/CreateTensorBinding[split_and_apply_vocab_2/vocabulary/vocabulary_file_1_txt_unpruned_vocab_size]/ToTensorBinding_72))+(ref_PCollection_PCollection_36/Write)
[2020-06-10 15:05:12,551] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,550] {{fn_api_runner.py:672}} INFO - Running (((ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WriteToFile/Write/WriteImpl/DoOnce/Read_94)+(ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WriteToFile/Write/WriteImpl/InitializeWrite_95))+(ref_PCollection_PCollection_47/Write))+(ref_PCollection_PCollection_48/Write)
[2020-06-10 15:05:12,576] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,576] {{fn_api_runner.py:672}} INFO - Running ((ref_AppliedPTransform_Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/Create/Read_78)+(Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/Flatten/Transcode/1))+(Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/Flatten/Write/1)
[2020-06-10 15:05:12,589] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,589] {{fn_api_runner.py:672}} INFO - Running (Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/Flatten/Read)+(Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/GroupByKey/Write)
[2020-06-10 15:05:12,608] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,608] {{fn_api_runner.py:672}} INFO - Running (((Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/GroupByKey/Read)+(ref_AppliedPTransform_Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/Top(3999)/ParDo(_MergeTopPerBundle)_84))+(ref_AppliedPTransform_Analyze/VocabularyPrune[split_and_apply_vocab_2/vocabulary]/ApplyFrequencyThresholdAndTopK/FlattenList_85))+(ref_PCollection_PCollection_44/Write)
[2020-06-10 15:05:12,637] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,637] {{fn_api_runner.py:672}} INFO - Running ((((ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/Prepare/Read_88)+(ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/OrderElements_89))+(ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WriteToFile/Write/WriteImpl/Map(<lambda at iobase.py:990>)_96))+(ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WriteToFile/Write/WriteImpl/WindowInto(WindowIntoFn)_97))+(Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WriteToFile/Write/WriteImpl/GroupByKey/Write)
[2020-06-10 15:05:12,778] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,778] {{fn_api_runner.py:672}} INFO - Running ((Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WriteToFile/Write/WriteImpl/GroupByKey/Read)+(ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WriteToFile/Write/WriteImpl/WriteBundles_102))+(ref_PCollection_PCollection_54/Write)
[2020-06-10 15:05:12,811] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,811] {{fn_api_runner.py:672}} INFO - Running ((ref_PCollection_PCollection_47/Read)+(ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WriteToFile/Write/WriteImpl/PreFinalize_103))+(ref_PCollection_PCollection_55/Write)
[2020-06-10 15:05:12,838] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,837] {{fn_api_runner.py:672}} INFO - Running ((ref_PCollection_PCollection_47/Read)+(ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WriteToFile/Write/WriteImpl/FinalizeWrite_104))+(ref_PCollection_PCollection_56/Write)
[2020-06-10 15:05:12,853] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,853] {{filebasedsink.py:291}} INFO - Starting finalize_write threads with num_shards: 1 (skipped: 0), batches: 1, num_threads: 1
[2020-06-10 15:05:12,955] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,954] {{filebasedsink.py:328}} INFO - Renamed 1 shards in 0.10 seconds.
[2020-06-10 15:05:12,977] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:12,977] {{fn_api_runner.py:672}} INFO - Running (((ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/CreatePath/Read_106)+(ref_AppliedPTransform_Analyze/VocabularyOrderAndWrite[split_and_apply_vocab_2/vocabulary]/WaitForVocabularyFile_107))+(ref_AppliedPTransform_Analyze/CreateTensorBinding[split_and_apply_vocab_2/vocabulary/Placeholder]/ToTensorBinding_109))+(ref_PCollection_PCollection_59/Write)
[2020-06-10 15:05:13,008] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:13,008] {{fn_api_runner.py:672}} INFO - Running (((((((((ref_AppliedPTransform_Analyze/CreateSavedModel/BindTensors/CreateSavedModel/Read_113)+(ref_AppliedPTransform_Analyze/CreateSavedModel/BindTensors/ReplaceWithConstants_114))+(ref_AppliedPTransform_Analyze/ComputeDeferredMetadata_119))+(ref_AppliedPTransform_Analyze/MakeCheapBarrier_120))+(ref_AppliedPTransform_WriteTransformFn/WriteTransformFnToTemp_131))+(ref_PCollection_PCollection_61/Write))+(ref_AppliedPTransform_WriteTransformFn/WriteMetadataToTemp/WriteMetadata_130))+(ref_PCollection_PCollection_65/Write))+(ref_PCollection_PCollection_70/Write))+(ref_PCollection_PCollection_71/Write)
[2020-06-10 15:05:13,143] {{logging_mixin.py:112}} WARNING - Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_1:0\022\025vocabulary_file_1_txt"
[2020-06-10 15:05:13,233] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:13,232] {{fn_api_runner.py:672}} INFO - Running (((ref_AppliedPTransform_Materialize[TransformIndex1]/Write/Write/WriteImpl/DoOnce/Read_193)+(ref_AppliedPTransform_Materialize[TransformIndex1]/Write/Write/WriteImpl/InitializeWrite_194))+(ref_PCollection_PCollection_107/Write))+(ref_PCollection_PCollection_108/Write)
[2020-06-10 15:05:13,260] {{logging_mixin.py:112}} INFO - [2020-06-10 15:05:13,260] {{fn_api_runner.py:672}} INFO - Running (((((((((((((ref_AppliedPTransform_ReadDataset[TransformIndex1]/Read/Read_154)+(ref_AppliedPTransform_ReadDataset[TransformIndex1]/AddKey_155))+(ref_AppliedPTransform_Decode[TransformIndex1]/ApplyDecodeFn_157))+(ref_AppliedPTransform_Transform[TransformIndex1]/Batch/BatchElements/ParDo(_GlobalWindowsBatchingDoFn)_161))+(ref_AppliedPTransform_Transform[TransformIndex1]/Transform_162))+(ref_AppliedPTransform_Transform[TransformIndex1]/ConvertAndUnbatch_163))+(ref_AppliedPTransform_Transform[TransformIndex1]/MakeCheapBarrier_164))+(ref_AppliedPTransform_EncodeAndSerialize[TransformIndex1]_168))+(ref_PCollection_PCollection_90/Write))+(ref_AppliedPTransform_Materialize[TransformIndex1]/Values_188))+(ref_AppliedPTransform_Materialize[TransformIndex1]/Write/Write/WriteImpl/WriteBundles_195))+(ref_AppliedPTransform_Materialize[TransformIndex1]/Write/Write/WriteImpl/Pair_196))+(ref_AppliedPTransform_Materialize[TransformIndex1]/Write/Write/WriteImpl/WindowInto(WindowIntoFn)_197))+(Materialize[TransformIndex1]/Write/Write/WriteImpl/GroupByKey/Write)
[2020-06-10 15:05:13,723] {{logging_mixin.py:112}} WARNING - Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_1:0\022\025vocabulary_file_1_txt"
[2020-06-10 15:07:31,564] {{logging_mixin.py:112}} INFO - [2020-06-10 15:07:31,564] {{fn_api_runner.py:672}} INFO - Running ((Materialize[TransformIndex1]/Write/Write/WriteImpl/GroupByKey/Read)+(ref_AppliedPTransform_Materialize[TransformIndex1]/Write/Write/WriteImpl/Extract_202))+(ref_PCollection_PCollection_115/Write)
[2020-06-10 15:07:31,587] {{logging_mixin.py:112}} INFO - [2020-06-10 15:07:31,587] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_IncrementColumnUsageCounter/CreateSole/Read_4)+(ref_AppliedPTransform_IncrementColumnUsageCounter/Count_5)
[2020-06-10 15:07:31,602] {{logging_mixin.py:112}} INFO - [2020-06-10 15:07:31,601] {{fn_api_runner.py:672}} INFO - Running ((ref_PCollection_PCollection_107/Read)+(ref_AppliedPTransform_Materialize[TransformIndex1]/Write/Write/WriteImpl/PreFinalize_203))+(ref_PCollection_PCollection_116/Write)
[2020-06-10 15:07:31,619] {{logging_mixin.py:112}} INFO - [2020-06-10 15:07:31,618] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_WriteTransformFn/CreateSole/Read_133)+(ref_AppliedPTransform_WriteTransformFn/PublishMetadataAndTransformFn_134)
[2020-06-10 15:07:31,644] {{logging_mixin.py:112}} INFO - [2020-06-10 15:07:31,644] {{fn_api_runner.py:672}} INFO - Running (((ref_AppliedPTransform_Materialize[TransformIndex0]/Write/Write/WriteImpl/DoOnce/Read_175)+(ref_AppliedPTransform_Materialize[TransformIndex0]/Write/Write/WriteImpl/InitializeWrite_176))+(ref_PCollection_PCollection_95/Write))+(ref_PCollection_PCollection_96/Write)
[2020-06-10 15:07:31,663] {{logging_mixin.py:112}} INFO - [2020-06-10 15:07:31,663] {{fn_api_runner.py:672}} INFO - Running (((((((((((((ref_AppliedPTransform_ReadDataset[TransformIndex0]/Read/Read_137)+(ref_AppliedPTransform_ReadDataset[TransformIndex0]/AddKey_138))+(ref_AppliedPTransform_Decode[TransformIndex0]/ApplyDecodeFn_140))+(ref_AppliedPTransform_Transform[TransformIndex0]/Batch/BatchElements/ParDo(_GlobalWindowsBatchingDoFn)_144))+(ref_AppliedPTransform_Transform[TransformIndex0]/Transform_145))+(ref_AppliedPTransform_Transform[TransformIndex0]/ConvertAndUnbatch_146))+(ref_AppliedPTransform_Transform[TransformIndex0]/MakeCheapBarrier_147))+(ref_AppliedPTransform_EncodeAndSerialize[TransformIndex0]_151))+(ref_PCollection_PCollection_80/Write))+(ref_AppliedPTransform_Materialize[TransformIndex0]/Values_170))+(ref_AppliedPTransform_Materialize[TransformIndex0]/Write/Write/WriteImpl/WriteBundles_177))+(ref_AppliedPTransform_Materialize[TransformIndex0]/Write/Write/WriteImpl/Pair_178))+(ref_AppliedPTransform_Materialize[TransformIndex0]/Write/Write/WriteImpl/WindowInto(WindowIntoFn)_179))+(Materialize[TransformIndex0]/Write/Write/WriteImpl/GroupByKey/Write)
[2020-06-10 15:07:32,106] {{logging_mixin.py:112}} WARNING - Expected binary or unicode string, got type_url: "type.googleapis.com/tensorflow.AssetFileDef"
value: "\n\013\n\tConst_1:0\022\025vocabulary_file_1_txt"
[2020-06-10 15:08:41,911] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:41,911] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_Transform[TransformIndex0]/PrepareToClearSharedKeepAlives/Read_149)+(ref_AppliedPTransform_Transform[TransformIndex0]/WaitAndClearSharedKeepAlives_150)
[2020-06-10 15:08:41,935] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:41,935] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_Transform[TransformIndex1]/PrepareToClearSharedKeepAlives/Read_166)+(ref_AppliedPTransform_Transform[TransformIndex1]/WaitAndClearSharedKeepAlives_167)
[2020-06-10 15:08:41,961] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:41,961] {{fn_api_runner.py:672}} INFO - Running ((Materialize[TransformIndex0]/Write/Write/WriteImpl/GroupByKey/Read)+(ref_AppliedPTransform_Materialize[TransformIndex0]/Write/Write/WriteImpl/Extract_184))+(ref_PCollection_PCollection_103/Write)
[2020-06-10 15:08:41,982] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:41,982] {{fn_api_runner.py:672}} INFO - Running ((ref_PCollection_PCollection_95/Read)+(ref_AppliedPTransform_Materialize[TransformIndex0]/Write/Write/WriteImpl/PreFinalize_185))+(ref_PCollection_PCollection_104/Write)
[2020-06-10 15:08:42,006] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,006] {{fn_api_runner.py:672}} INFO - Running (ref_PCollection_PCollection_95/Read)+(ref_AppliedPTransform_Materialize[TransformIndex0]/Write/Write/WriteImpl/FinalizeWrite_186)
[2020-06-10 15:08:42,028] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,028] {{filebasedsink.py:291}} INFO - Starting finalize_write threads with num_shards: 1 (skipped: 0), batches: 1, num_threads: 1
[2020-06-10 15:08:42,130] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,130] {{filebasedsink.py:328}} INFO - Renamed 1 shards in 0.10 seconds.
[2020-06-10 15:08:42,149] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,148] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_Analyze/CreateSavedModelForAnalyzerInputs[Phase0]/Count/CreateSole/Read_25)+(ref_AppliedPTransform_Analyze/CreateSavedModelForAnalyzerInputs[Phase0]/Count/Count_26)
[2020-06-10 15:08:42,175] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,175] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_Analyze/InstrumentAPI/CreateSoleAPIUse/Read_17)+(ref_AppliedPTransform_Analyze/InstrumentAPI/CountAPIUse_18)
[2020-06-10 15:08:42,199] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,199] {{fn_api_runner.py:672}} INFO - Running (ref_PCollection_PCollection_107/Read)+(ref_AppliedPTransform_Materialize[TransformIndex1]/Write/Write/WriteImpl/FinalizeWrite_204)
[2020-06-10 15:08:42,221] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,221] {{filebasedsink.py:291}} INFO - Starting finalize_write threads with num_shards: 1 (skipped: 0), batches: 1, num_threads: 1
[2020-06-10 15:08:42,323] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,323] {{filebasedsink.py:328}} INFO - Renamed 1 shards in 0.10 seconds.
[2020-06-10 15:08:42,342] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,341] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_Analyze/CreateSavedModel/Count/CreateSole/Read_117)+(ref_AppliedPTransform_Analyze/CreateSavedModel/Count/Count_118)
[2020-06-10 15:08:42,366] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,365] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_WriteMetadata/Create/Read_126)+(ref_AppliedPTransform_WriteMetadata/WriteMetadata_127)
[2020-06-10 15:08:42,410] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,409] {{fn_api_runner.py:672}} INFO - Running (ref_AppliedPTransform_Analyze/PrepareToClearSharedKeepAlives/Read_122)+(ref_AppliedPTransform_Analyze/WaitAndClearSharedKeepAlives_123)
[2020-06-10 15:08:42,461] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:42,461] {{base_component_launcher.py:208}} INFO - Running publisher for Transform
[2020-06-10 15:08:46,476] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:46,475] {{metadata_store.py:65}} INFO - MetadataStore with DB connection initialized
[2020-06-10 15:08:46,608] {{python_operator.py:114}} INFO - Done. Returned value was: None
[2020-06-10 15:08:50,628] {{taskinstance.py:1048}} INFO - Marking task as SUCCESS.dag_id=mlops_prod_web_request_classifier_v1, task_id=Transform, execution_date=20200610T145134, start_date=20200610T145351, end_date=20200610T150850
[2020-06-10 15:08:55,627] {{logging_mixin.py:112}} INFO - [2020-06-10 15:08:55,627] {{local_task_job.py:103}} INFO - Task exited with return code 0
ruoyu90 commented 4 years ago

@jason-brian-anderson Just to confirm, you run the same pipeline (with ImpoterNode) twice and found it not reusing cache? If so could you adjust the logging level to DEBUG and see whether the extra logs can bring us more info?

jason-brian-anderson commented 4 years ago

Sorry I'm just now getting back to this. It's not immediately obvious - at least to me - how to adjust logging level to debug in TF2. Will look more for this so i can provide the debug info.

arghyaganguly commented 3 years ago

Please find the code below where Logging Level is set to 10 i.e., DEBUG, meaning only all the Logs should be Printed:

import logging
import tensorflow as tf

tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.DEBUG)

tf.compat.v1.logging.error('Error Message')
tf.compat.v1.logging.info('Info Message')
tf.compat.v1.logging.warning('Warning Message')

As shown below, it has printed all the Log Messages:

ERROR:tensorflow:Error Message
INFO:tensorflow:Info Message
WARNING:tensorflow:Warning Message

2) set the Logging Level to 30, i.e., WARN

import logging
import tensorflow as tf

tf.compat.v1.logging.set_verbosity(30) # WARN

tf.compat.v1.logging.error('Error Message')
tf.compat.v1.logging.info('Info Message')
tf.compat.v1.logging.warning('Warning Message')

As shown below, Info Logs will be Filtered and Warning and Error Logs will be printed:

ERROR:tensorflow:Error Message
WARNING:tensorflow:Warning Message

Let's Set the Logging Level to 40 i.e., ERROR

import logging
import tensorflow as tf

tf.compat.v1.logging.set_verbosity(40) # ERROR

tf.compat.v1.logging.error('Error Message')
tf.compat.v1.logging.info('Info Message')
tf.compat.v1.logging.warning('Warning Message')

Now, we can see that only Error Message is printed:

ERROR:tensorflow:Error Message

Hope this helps

arghyaganguly commented 3 years ago

Closing this as it has been inactive with awaiting response for some time.Please feel free to reopen.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No