comments on chapter 5 - Githubissues

I think most if not all of the problems found by the reviewers also apply to the thesis, so there will need to be some re-orientation. You will need to do the end to end performance studies. But in the meantime here's comments on chapter 5.

The biggest problem is iProbe comes totally out of nowhere. You need to start out the iprobe chapter by explaining why this is relevant to Parikshan and live debugging. Debugging is indeed mentioned a few times, but presented in a totally different way than in the Parikshan chapters, and in particular it seems to be happening in the production application not in a separate debug copy of the system.

"Ideally, a production system tracing tool should have zero-overhead when it is not activated and should have a low overhead when it is activated. In other words, its performance should not adversely effect the usage of the traced application. At the same time, it should be flexible enough so as to meet versatile instrumentation needs at run-time for management tasks such as trouble-shooting or performance analysis." But you aren't putting any tracing into production, only into debug containers, so this issue seems moot. IProbe has to be presented as used in Parikshan, not independently.

"Over the years researchers have proposed many tools to assist in application performance analytics" It is very suspect that none of the cited papers are more recent than 2009.

"Inherently, these tools have been developed for the development environment, hence are not meant for a production system tracer." But you are not tracing the production system, you are tracing in debug containers. Maybe some argument could be made about the length of the debug window, so you cannot slow down the debug containers too much.

"Production system tracers such as DTrace[McDougall et al., 2006] and SystemTap[Prasad et al., 2005] allow for low overhead kernel function tracing. These tools are optimized for inserting hooks in kernel function/system calls, and can monitor run-time application behavior over long time periods. However, they have limited instrumentation capabilities for user-space instrumentation, and incur a high overhead due to frequent kernel context-switches and complex trampoline mechanisms." Again, why is any of this relevant to the debug containers, particularly in an SoA setup divorced from any OS kernel, where you are only dealing with network traffic not system calls. There is no discussion in the Parikshan material about kernel vs. user space, so you need to justify why its brought up here. And again the cited papers are too old.

"However, they are inflexible and can only be turned on/off at compile-time or before starting the execution." How does this relate to cloning on the fly?

"In this paper" You need to search the entire thesis for paper and replace with thesis unless you are indeed talking about some paper.

"iProbe has instrumentation overheads comparable to print/debug statements, while still giving users the flexibility to choose targets in the execution stage." So is this choosing due to the choice of which production containers are cloned to debug containers? Or are the targets finer granularity, within debug containers?

"We evaluated iProbe on micro-benchmark and SPEC CPU 2006 benchmarks." Explain what iprobe is, how it works, and most significantly why are you even telling the reader about it, before discussing evaluation. An organization that might have indeed made sense for a stand-alone paper does not make sense in a thesis. This chapter needs a roadmap reminding the reader how this is connected to Parikshan, and this connection has to be reminded over and over again.

"We also present a new hardware event profiling tool called FPerf developed in the iProbe framework." Why are we concerned here with profiling hardware rather than software?

"The main idea in iProbe design is decoupling the process of run-time instrumentation into offline and and online stages," So is the idea that you have pre-prepared instrumented versions of the production code to use in the debug containers? But then they're not really cloned, this is more like mutable replay.

"Most existing dynamic instrumentation mechanisms rely on a trampoline based design, and generally have to make several jumps to get to the instrumentation function as they not only do instrumentation but also simulate the instructions that have been overwritten. Additionally, they have frequent context-switches as they use kernel traps to capture instrumentation points, and execute the instrumentation. The performance penalty imposed by these designs are unacceptable in a production environment." But they aren't running in a production environment, the whole point of the separate debug containers was to allow for such slowdowns.

I had many more comments here but deleted them, since they basically kept asking the same questions about why are you telling me this, it seems irrelevant to Parikshan. This chapter has to be rewritten to fit the thesis, it does not work to just copy/paste the paper here. One tack might be to present iProbe before Parikshan in the thesis, not the other way around, and describe iProbe as an earlier approach (which it indeed was) that intended low overhead on the production environment - but then you came up with the better idea of even less overhead for the special case of SoA. Then you would not need to make iProbe "fit" into Parikshan. Janak Parekh basically wrote his thesis in that style, presenting the earlier work with me as motivating the later work with Sal to deal with a targeted problem that (supposedly) arose from the earlier work, came across very nicely even though revisionist history.

I'll read a new iProbe chapter once it has been moved and the thesis reorganized, either as above or some other way, but right now the chapter makes no sense.

nipunarora / parikshan

comments on chapter 5 #6