uwescience / myria

Myria is a scalable Analytics-as-a-Service platform based on relational algebra.
myria.cs.washington.edu
Other
113 stars 46 forks source link

Profiling: which operator gets blamed for the flush time? #543

Open dhalperi opened 10 years ago

dhalperi commented 10 years ago

This is my new hypothesis for why an Apply can take many seconds.

@domoritz

To see what makes me upset: https://demo.myria.cs.washington.edu/profile?queryId=11686

Select Fragment 1, time between 43 and 58 seconds, and look at worker 14.

screen shot 2014-06-04 at 11 05 13 pm

This seems crazy to me, but I verified it in the postgres database. Note that the join returns null, so the apply will immediately return null: https://github.com/uwescience/myria/blob/master/src/edu/washington/escience/myria/operator/Apply.java#L81

I can't see how that would take ~14s.

Other possibilities:

stechu commented 10 years ago

Is this repeatable?

domoritz commented 10 years ago

There is one thing which can take a while which is writing the profiling logs to the database (https://github.com/uwescience/myria/blob/master/src/edu/washington/escience/myria/parallel/ProfilingLogger.java#L231). It's writing 10k tuples to the db, which might take a while and since I cannot control in which operator it happens, it might block a random operator for a short time.

I remember we said let's not worry about this but maybe it's the time for asynchronous writing.

However, writing around 10MB shouldn't take 14 seconds, right?

dhalperi commented 10 years ago

Hmm, well, no. The query doesn't take any longer with profiling on... So that can't be it -- right?

(This is the critical fragment, so if profiling was the issue then slowing it down should have a large effect on runtime.)

domoritz commented 10 years ago

We can decrease the batch size in the profiling logger to see whether this is the cause.

domoritz commented 10 years ago

I think this was resolved, right?

dhalperi commented 10 years ago

I think the fundamental question -- which operator gets blamed for profiling overhead -- still stands :)

We might want to retitle this issue something about "show profiling overhead in profiling results" somehow.

domoritz commented 9 years ago

Good point. I'm all for showing where it's happening rather than trying to avoid it with non-obvious methods.