ipfs / iptb

InterPlanetary TestBed 🌌🛌
MIT License
163 stars 37 forks source link

Collecting statistics #77

Open travisperson opened 6 years ago

travisperson commented 6 years ago

See #65 for complete history


Original comment by @davinci26

Hey y'all,

Thanks for the project it helped me a lot!

As discussed in IPFS issue board IPFS Performance #5226, I made some changes to the IPTB framework to generate measure the performance of IPFS and generate performance graphs. In detail I added the following functions in the framework:

  • iptb make-topology: This creates a connection graph between the nodes (e.g. star topology, barbell topology). In the topology files empty lines and lines starting with # are disregarded. For non empty line the syntax is origin:connection 1, connection 2 ... where origin and connections are specified with their node ID.
  • iptb dist -hash: The simulation here distributes a single file from node 0 to every other node in the network. Then it calculates the average time required to download the file, the standard deviation of the time, the maximum time, the minimum time, the duplicate blocks. The results are saved in a generated file called results.json.

I also added a Python3 script to plot the results that adds an additional optional dependency in the project, Matplotlib.

Finally I created a readme file, simulation.md, that explains the logic of the simulation. I also added there the response of @whyrusleeping to the issue Simulate bad network #50 so people would know that there is support for bad network.

I would appreciate your feedback and any improvements suggestions :)

travisperson commented 6 years ago

Originally this work was done to iptb prior to the transition to using plugins. After the transition, we wanted to provide a generic way to handle the implementation of iptb dist, which was basically recording timing information around a RunCmd call.

The solution to this was to add a generic way to capture stats around iptb run by recording execution time, and calculating different stats to be reported at the end.

However, I think recording and reporting the elapsed execution time is probably a useful enough thing on it's own that we should probably just add it to everything that uses the generic reporting. If we are exposing the elapsed time as output, I think it provides enough information to calculate different statistics outside of iptb itself.

There are two other piece though that I think also need to be touched on

1) Parsing output Parsing generic output is not always ideal, we might be able to solve this really simply by supporting different encoding for the output. At first just text or json. 2) Collecting metrics Currently, using iptb metric is the only way to do this, and for the most basic metrics this works okay, as a user can run the collection before, and after. This type of collection only works for accumulated metrics, such as bandwidth, or some other metrics which aren't of a realtime nature.

Real time metrics are another thing (cpu, ram, etc) and I'm open to discussion around these.

To summarize, I think a simple approach to supporting this use case at first is to add a elapsed time for all outputs along side the exit code, and adding the ability to return output as json. Metrics can be collected independently as the user sees fit.

davinci26 commented 6 years ago

I think that designing features such as #75 #76 #77 moves IPTB into a direction that puts a lot of additional weight on IPTB.

General Thoughts

My thoughts on the subject as a user of the project and as a developer in general is:

I think there is a room for such features because there are a lot of projects (OpenBazzarr etc) that want to measure the performance of IPFS and add it as a component of their system #50 #26 . I also got involved to the project to measure the performance of IPFS because it was a crucial component in my system. The question that remains is if the core development team want to take this burden or choose to leave it to the users. For me both options have benefits and it highly depends on the time that core devs have available. I understand that you may want to spend time developing time to develop/improve IPFS/libp2p than IPTB. It's a decision that core devs should make since they have a more holistic view on IPFS milestones. Personally, I trust you to make a good decision.

Output

I agree as far as the elapsed time is concerned, the current implementation of elapsed time is robust. I would prefer having the output as a JSON file or txt file after the individual results as it makes it easier to parse.

Something like this:

iptb run -- ipfs id --format="<id>"
node[0] exit 0

QmXschyVzLmS4JqPN1kuhCXTjau2oQkVuzjvTbQFTGm3w3
node[1] exit 0;

Qme3h8WwfpBiPHPfdEs9GuegijVhaBX9xYPXTTDAS6uciR
node[2] exit 0;

Time Results: {Specified format}

This will make the parsing from other programs easier compared to having the individual results per node.

Metrics

  1. For real time metrics it would be more reasonable to be produced by the plugin and just have an interface on IPTB. As discussed on IRC maybe IPTB could request/get heartbeats from the pluginin that contain real-time metrics and forward them to the user. If you are interested in a design like this I can take a more detailed look at what this would look like and post my findings here to iterate on the design

  2. Additional issue with the metrics is that currently you can collect only one metric instead of multiple (correct me if I am wrong)

Stats

Providing the basic stats based on elapsed time is basically a free primitive in terms of development and computational cost from my perspective. The same does not hold for calculating stats on metrics. Additionally it could be used to automate the bencharmaking of plugins instead of having custom benchmarking by everyone.

cc @dgrisham

travisperson commented 6 years ago

@davinci26 thanks for writing all of this out! I want to respond to it all, but won't be able to for 12 hours.

I did want to comment quickly though about the output

I would prefer having the output as a JSON file or txt file after the individual results as it makes it easier to parse.

I want to provide an easy way to parse, but I don't want to mix that with the human readable text if we can. One way to solve this would be to support an output encoding (ex: iptb --enc json run -- <cmd>), which would output everything encoded to something that could be parsed easily.

One of the things I did like about the original idea for a "stats" flag, was it provide an easy way to get just the stats out without also interfering with the other output of the command.

It actually provided a really interesting way to interact with iptb for stat gathering purposes.

I wrote a small python script which would read from stdin (could be any file I guess), and parse each line and calculate some basic stats.

To connect it up to iptb, I made a named pipe. Every iptb command I ran would print the stats out to the named pipe.

On the other end of the pipe was the python script. So for every command I ran through iptb, it would print the stats in another window.

(Example)

$ mkfifo stats
$ iptb run --stats ./stats -- ipfs id

In another window

tail -f ./stats | python stats.py

This provides a really easy way to collect some output and run whatever calculations you want over it. I'm just not sure exactly what we want to be in the output, or if this is exactly the way to do it.

One possibility is to have a event logging around the Core interface which would provide a much more detailed look into what is happening everywhere around the plugin. This would be a much more generic implementation and I think would provide users with almost everything they would need, or at least in an easy to extend way. Basically what method on the plugin that was invoked, and what it was called with.

Script

import sys
import statistics
import json

print("MEAN\tSTDEV\tVARIANCE\n")
for line in sys.stdin:
    try:
        jline = json.loads(line.rstrip())
    except ValueError:
        continue

    nums = [o['elapsed'] for o in jline['results']]

    mean = statistics.mean(nums)
    stdev = statistics.stdev(nums)
    variance = statistics.variance(nums)

    print('{:.2f}\t{:.2f}\t{:.2f}'.format(mean, stdev, variance))
dgrisham commented 6 years ago

Some thoughts (will respond with more as things percolate):