Closed JasonBarnabe closed 6 years ago
I liked your idea, however, I needed to give predictable sequence of table execution to the tool considering constraints like Foreign Keys in database. So if we change table orders it might create problems. So idea is, let user define the order in which he likes to process the tables as needed.
If you are using execution_strategy DataAnon::Parallel::Table
you are not guaranteed predictable execution sequence anyway.
I understand your point of view that since we are running in parallel it might have similar problems as well. However, my thought is, atleast user has option to reorder their tables in DSL as needed. E.g. as per your need, you could change order of table in DSL based on size of table calculation done outside of tool.
I am not very comfortable changing the order based on size of the table and taking away control from users. Let me still think through and bounce idea with few colleagues in office.
On Fri, 30 Mar 2018 at 20:30 Jason Barnabe notifications@github.com wrote:
If you are using execution_strategy DataAnon::Parallel::Table you are not guaranteed predictable execution sequence anyway.
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/sunitparekh/data-anonymization/pull/58#issuecomment-377543051, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK2uk2vc82K6ye105Nz4FMNKO7JK4qCks5tjkimgaJpZM4TAwVy .
When using
execution_strategy DataAnon::Parallel::Table
, the tables will be processed in the order they are defined. If the number of tables is larger than the number of processes, then some tables have to wait for others to be completed.We should try to make the table that takes longest to finish processing start processing first. This will make the entire run complete more quickly. In general, more rows = more time to process. So run the biggest tables first.