Open jlewi opened 2 years ago
I suspect this has something to do with how the stack of executors (e.g. federating and resolving executors) are constructed. We probably need to control how that stack of aggregators is constructed to separate out the executors that should handle work in each SILO vs. intermediary executors handling aggregation of results.
See this TFF question. https://discuss.tensorflow.org/t/what-does-it-mean-for-a-tff-executor-to-handle-a-cardinality-with-an-integer-greater-than-1/12277
I think this validates the aforementioned hypothesis.
The E2E test is currently creating tasks for two different groups. Its not clear why this is because a single worker should be required based on the data being passed to the program.
https://github.com/jlewi/flaap/blob/9af7e29d45e6e79a0b62e19e75c47e56a64c4184/py/flaap/testing/fed_average.py#L104
Creating this issue to investigate it further as it likely indicates a bug due to a misunderstanding of how TFF works.