Open zywind opened 2 years ago
I tried setting Dataflow's max_num_workers to 1 and the job succeeded. Looks like the problem is indeed in running dataflow with multiple workers.
Hi @zywind ,
As mentioned here, for distributed evaluation, we use tfma.ExtractEvaluateAndWriteResults
. Please refer to this example notebook let me know if this resolves your issue.
Thank you.
System information
I am using TFX's evaluator
Describe the problem
Running the same evaluation using Beam's DirectRunner locally will not cause any error, but whenever I run it on dataflow and when dataflow spawns more than one worker, I get an error like so:
Based on the dataflow log, the failing steps were:
I see that you have this commit, which appears to be addressing this problem, but it is immediately rolled back. I wonder if you have had similar issues and what would you recommend to fix the error.