Open travisperson opened 6 years ago
Originally this work was done to iptb prior to the transition to using plugins. After the transition, we wanted to provide a generic way to handle the implementation of iptb dist
, which was basically recording timing information around a RunCmd
call.
The solution to this was to add a generic way to capture stats around iptb run
by recording execution time, and calculating different stats to be reported at the end.
However, I think recording and reporting the elapsed execution time is probably a useful enough thing on it's own that we should probably just add it to everything that uses the generic reporting. If we are exposing the elapsed time as output, I think it provides enough information to calculate different statistics outside of iptb itself.
There are two other piece though that I think also need to be touched on
1) Parsing output
Parsing generic output is not always ideal, we might be able to solve this really simply by supporting different encoding for the output. At first just text or json.
2) Collecting metrics
Currently, using iptb metric
is the only way to do this, and for the most basic metrics this works okay, as a user can run the collection before, and after. This type of collection only works for accumulated metrics, such as bandwidth, or some other metrics which aren't of a realtime nature.
Real time metrics are another thing (cpu, ram, etc) and I'm open to discussion around these.
To summarize, I think a simple approach to supporting this use case at first is to add a elapsed time for all outputs along side the exit code, and adding the ability to return output as json. Metrics can be collected independently as the user sees fit.
I think that designing features such as #75 #76 #77 moves IPTB into a direction that puts a lot of additional weight on IPTB.
My thoughts on the subject as a user of the project and as a developer in general is:
I think there is a room for such features because there are a lot of projects (OpenBazzarr etc) that want to measure the performance of IPFS and add it as a component of their system #50 #26 . I also got involved to the project to measure the performance of IPFS because it was a crucial component in my system. The question that remains is if the core development team want to take this burden or choose to leave it to the users. For me both options have benefits and it highly depends on the time that core devs have available. I understand that you may want to spend time developing time to develop/improve IPFS/libp2p than IPTB. It's a decision that core devs should make since they have a more holistic view on IPFS milestones. Personally, I trust you to make a good decision.
I agree as far as the elapsed time is concerned, the current implementation of elapsed time is robust. I would prefer having the output as a JSON file or txt file after the individual results as it makes it easier to parse.
Something like this:
iptb run -- ipfs id --format="<id>"
node[0] exit 0
QmXschyVzLmS4JqPN1kuhCXTjau2oQkVuzjvTbQFTGm3w3
node[1] exit 0;
Qme3h8WwfpBiPHPfdEs9GuegijVhaBX9xYPXTTDAS6uciR
node[2] exit 0;
Time Results: {Specified format}
This will make the parsing from other programs easier compared to having the individual results per node.
For real time metrics
it would be more reasonable to be produced by the plugin and just have an interface on IPTB. As discussed on IRC maybe IPTB could request/get heartbeats from the pluginin that contain real-time metrics and forward them to the user. If you are interested in a design like this I can take a more detailed look at what this would look like and post my findings here to iterate on the design
Additional issue with the metrics is that currently you can collect only one metric instead of multiple (correct me if I am wrong)
Providing the basic stats based on elapsed time is basically a free primitive in terms of development and computational cost from my perspective. The same does not hold for calculating stats on metrics
. Additionally it could be used to automate the bencharmaking of plugins instead of having custom benchmarking by everyone.
cc @dgrisham
@davinci26 thanks for writing all of this out! I want to respond to it all, but won't be able to for 12 hours.
I did want to comment quickly though about the output
I would prefer having the output as a JSON file or txt file after the individual results as it makes it easier to parse.
I want to provide an easy way to parse, but I don't want to mix that with the human readable text if we can. One way to solve this would be to support an output encoding (ex: iptb --enc json run -- <cmd>
), which would output everything encoded to something that could be parsed easily.
One of the things I did like about the original idea for a "stats" flag, was it provide an easy way to get just the stats out without also interfering with the other output of the command.
It actually provided a really interesting way to interact with iptb for stat gathering purposes.
I wrote a small python script which would read from stdin (could be any file I guess), and parse each line and calculate some basic stats.
To connect it up to iptb, I made a named pipe. Every iptb command I ran would print the stats out to the named pipe.
On the other end of the pipe was the python script. So for every command I ran through iptb, it would print the stats in another window.
(Example)
$ mkfifo stats
$ iptb run --stats ./stats -- ipfs id
In another window
tail -f ./stats | python stats.py
This provides a really easy way to collect some output and run whatever calculations you want over it. I'm just not sure exactly what we want to be in the output, or if this is exactly the way to do it.
One possibility is to have a event logging around the Core interface which would provide a much more detailed look into what is happening everywhere around the plugin. This would be a much more generic implementation and I think would provide users with almost everything they would need, or at least in an easy to extend way. Basically what method on the plugin that was invoked, and what it was called with.
Script
import sys
import statistics
import json
print("MEAN\tSTDEV\tVARIANCE\n")
for line in sys.stdin:
try:
jline = json.loads(line.rstrip())
except ValueError:
continue
nums = [o['elapsed'] for o in jline['results']]
mean = statistics.mean(nums)
stdev = statistics.stdev(nums)
variance = statistics.variance(nums)
print('{:.2f}\t{:.2f}\t{:.2f}'.format(mean, stdev, variance))
Some thoughts (will respond with more as things percolate):
iptb run
was already doing asynchronous runs for single command cases.ipfs
has various subsystems for logging (e.g. you can do ipfs log level engine debug
(general form ipfs log level <subsytem> <level>
) to modify the engine
subsystem's log level) -- maybe we could have something like that, but <subsystem>
is replaced by <plugin>
, then users can log all they want within their plugin.json
output would be nice, and agree with @travisperson's comment about not mixing human readable + parsable output.iptb run
on a dockeripfs
node vs. a localipfs
node. If we measure that at the level of IPTB, then we might be getting extra time in the dockeripfs
case. So it might make more sense to have plugins at least implement how timings take place (e.g. in the case of dockeripfs
, maybe the docker exec
args get wrapped in something that times the user's command), then maybe IPTB sets up the interface for those timings like it does for run
/etc. That seems to make sense to me given my understanding of how people want to use IPTB. Let me know if these points make sense or if I'm misinterpreting something, though.
See #65 for complete history
Original comment by @davinci26