Replace lambda calls in Batch and Sync_tables with internal functions

dougbrn commented 9 months ago

Overloaded lambdas lead to ambiguity in dask dashboard profiling, so we should replace these (starting with batch and sync_tables as the biggest users of it) with specifically-defined functions to be able to tell them apart. Raised by @hombit

dougbrn commented 9 months ago

I've been poking around with this. It's feasible to replace lambda calls with functions and have that reflected deep in the dask profiler, but it looks like it's harder to consistently get something like this: Screen Shot 2024-02-01 at 3 09 58 PM Where "mymeanfunction" (being applied via batch) shows up consistently as a top level task on the progress bar. The challenge is that the task label takes on the name of last dask function applied. So for a certain workflow, you can have something like this happen, where we do batch output standardization using Ensemble._standardize_batch and the final operation applied is to convert to an EnsembleFrame, yielding a task label of "TapeFrame" here: Screen Shot 2024-02-01 at 3 13 51 PM

We can still make some progress by having functions show up deeper in the profiler, like a _apply_batch function which can be found in the profiler: Screen Shot 2024-02-01 at 3 17 23 PM

But this is an incremental change that I'm not sure addresses the real crux of this ticket, which is I think wanting something like the first image? @hombit let me know if that's the case. Enabling something like the first image consistent would involve having our hands a lot deeper into dask delayed and the lower-level collections, and even then I'm not sure exactly what we can do. I think it's worth it to go ahead with the lambda replacement regardless because I think it reads better code-wise, but again I'm not sure it really would solve the core problem highlighted in this ticket.

hombit commented 9 months ago

Thank you for the investigation, @dougbrn, I haven't realized how Dask names stuff. I'm ok with closing this issue if it wouldn't help in profiling

dougbrn commented 9 months ago

Let's keep this open until we maybe have a conversation at a future meeting just to verify level of importance vs effort.

lincc-frameworks / tape

Replace lambda calls in Batch and Sync_tables with internal functions #357