Open axelning opened 3 years ago
duplicate tfx#3343
@zoyahav , shall we track this issue here or in tfx#3343 ?
Let's keep it here for now.
@axelning are you able to check if the issue occurs with CPU as well?
Let's keep it here for now.
@axelning are you able to check if the issue occurs with CPU as well?
by setting the growth limitation and worker_num limitation, this issue can be circumvented and in cpu just bcz i got 32GB memory, so this issue is not reproduced during running。
still,the memory management of gpu is keeping emerging,may be some architect optimization is needed
@axelning have you considered overriding the executor in the Transform component with a canonical tft based implementation?
If the bug is related to a specific library below, please raise an issue in the respective repo directly:
TensorFlow Data Validation Repo
TensorFlow Model Analysis Repo
TensorFlow Transform Repo
TensorFlow Serving Repo
System information
Describe the current behavior In tfx transform module it calls tensorflow_transform> beam >impl.py:1058
this will call infer_feature_schma_v2 in schema_inference.py :163
in this function, tf2_utils.supply_missing_inputs(structured_inputs, batch_size=1) in line 195 will tries to convert inputs to tensor and will not release the gpu memory when finished. By default this operation takes 7715 MB on my singlee Tesla p40
When I run into OOM because the following training starts to apply for the GPU, and after I stop the whole process and continue, cause the transform has been saved and the trainning goes successful, which means this part does not need to keep in the GPU from when it ends.