Open dhalperi opened 10 years ago
Is this repeatable?
There is one thing which can take a while which is writing the profiling logs to the database (https://github.com/uwescience/myria/blob/master/src/edu/washington/escience/myria/parallel/ProfilingLogger.java#L231). It's writing 10k tuples to the db, which might take a while and since I cannot control in which operator it happens, it might block a random operator for a short time.
I remember we said let's not worry about this but maybe it's the time for asynchronous writing.
However, writing around 10MB shouldn't take 14 seconds, right?
Hmm, well, no. The query doesn't take any longer with profiling on... So that can't be it -- right?
(This is the critical fragment, so if profiling was the issue then slowing it down should have a large effect on runtime.)
We can decrease the batch size in the profiling logger to see whether this is the cause.
I think this was resolved, right?
I think the fundamental question -- which operator gets blamed for profiling overhead -- still stands :)
We might want to retitle this issue something about "show profiling overhead in profiling results" somehow.
Good point. I'm all for showing where it's happening rather than trying to avoid it with non-obvious methods.
This is my new hypothesis for why an Apply can take many seconds.
@domoritz
To see what makes me upset: https://demo.myria.cs.washington.edu/profile?queryId=11686
Select Fragment 1, time between 43 and 58 seconds, and look at worker 14.
This seems crazy to me, but I verified it in the postgres database. Note that the join returns
null
, so the apply will immediately return null: https://github.com/uwescience/myria/blob/master/src/edu/washington/escience/myria/operator/Apply.java#L81I can't see how that would take ~14s.
Other possibilities: