Closed varshith15 closed 2 years ago
Sachin uses an open-loop and asynchronous benchmark tool as compared to YCSB which is a closed-loop and synchronous
It seems like Sachin had used a fixed number of clients (2) and used QPS as the constraint to plot Throughput vs Latency.
Whereas I did not have a fixed number of clients but instead plotted the throughputs and latencies I got using a different number of clients each time.
These 2 approaches are quite different.
But Sachin's approach makes sense in that case as he uses an open-loop benchmark and it can generate as much throughput as we want (of course considering the bottlenecks) which is not the case with a closed-loop
Sachin points out we should be looking at RSS instead of mem usage to get an idea as to how much mem the process is actually using which I completely missed before. Will revisit the experiments with this new info.
Sachin also makes a brilliant observation that PySyncObj violates the RAFT algorithm by not creating a persistent log journal which causes mem contention throughput to be lesser than the baseline result. I will check the log replication logic of RethinkDB
Sachin mentiones for slow CPU on the follower "When a follower is subjected to CPU contention, throughput drops only very slightly which probably can be attributed to some CPU contention at cache level for the leader in this pseudo-distributed setup."
^ I don't completely understand the reasoning. Will revisit this later.
Just a note that if the goal is to write a tool paper, your goal is to justify that your tool is useful. In this context, it means your tool can support what Sachin needs in his experiments. If you cannot support (which is likely the reason he didn’t use slooo), then the question is how to make slooo be able to support.
On Sun, May 22, 2022 at 2:07 PM Varshith Bathini @.***> wrote:
Differences compared to Sachin's report
-
Sachin uses an open-loop and asynchronous benchmark tool as compared to YCSB which is a closed-loop and synchronous
It seems like Sachin had used a fixed number of clients (2) and used QPS as the constraint to plot Throughput vs Latency.
Whereas I did not have a fixed number of clients but instead plotted the throughputs and latencies I got using a different number of clients each time.
These 2 approaches are quite different.
But Sachin's approach makes sense in that case as he uses an open-loop benchmark and it can generate as much throughput as we want (of course considering the bottlenecks) which is not the case with a closed-loop
- Sachin uses stress-ng to apply stress on the CPU, Slooo uses cgroups. This shouldn't cause a big difference in the results.
Takeaways from Sachin’s report
-
Sachin points out we should be looking at RSS instead of mem usage to get an idea as to how much mem the process is actually using which I completely missed before. Will revisit the experiments with this new info.
Sachin also makes a brilliant observation that PySyncObj violates the RAFT algorithm by not creating a persistent log journal which causes mem contention throughput to be lesser than the baseline result. I will check the log replication logic of RethinkDB
Questions
-
Sachin mentiones for slow CPU on the follower "When a follower is subjected to CPU contention, throughput drops only very slightly which probably can be attributed to some CPU contention at cache level for the leader in this pseudo-distributed setup."
^ I don't completely understand the reasoning. Will revisit this later.
— Reply to this email directly, view it on GitHub https://github.com/xlab-uiuc/slooo_internal/issues/45#issuecomment-1133955105, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASPIR3MVTM6NDFP5EE7TC3VLKAWTANCNFSM5WSFJWAQ . You are receiving this because you were mentioned.Message ID: @.***>
-- Tianyin Xu, https://tianyin.github.io/
Just a note that if the goal is to write a tool paper, your goal is to justify that your tool is useful. In this context, it means your tool can support what Sachin needs in his experiments. If you cannot support (which is likely the reason he didn’t use slooo), then the question is how to make slooo be able to support.
@tianyin what I believe the reason Sachin did not use Slooo is that PySyncObj is not a quorum system, it's a python library and he created a simple KV store system out of it. Either way, I was going to talk to him about the reasons. I was also looking at his assignment to understand the holes you had pointed out in mine which I plan to fix as well.
Is slack the best way to contact him?
Seems like the solution Wenshan and Shichen have implemented is a pretty straightforward approach of iterating over a set of predetermined faults (given by the user) to check if the fault causes a crash, if it does then the fault before this is the point break.
This is kind of already implemented in the code revamp we did.
IMO it isn't a smart enough solution as it still needs the user's input as to what values of faults Slooo needs to be looking for the point break and also depending on how accurate the point break is needed, it may take a long time to find as it's iterative.
This seems like the only solution for now though (ironically first thing I and @Essoz came up with). Idk if it's a good enough solution to put the paper.
@tianyin thoughts?
Evaluation Section
Point Break
Repo Updates
Paper Submission