hubblo-org / scaphandre

⚡ Energy consumption metrology agent. Let "scaph" dive and bring back the metrics that will help you make your systems and applications more sustainable !
Apache License 2.0
1.58k stars 105 forks source link

PluginSensor to allow sensors to be written in another language. #17

Open bpetit opened 3 years ago

bpetit commented 3 years ago

dig into this: https://michael-f-bryan.github.io/rust-ffi-guide/dynamic_loading.html

jhwgh1968 commented 3 years ago

I have considerable cross-language expertise, and was looking for a couple open source contributions to help with. After I spent a little time looking at this, I would offer some thoughts:

  1. I don't think the current code in the Sensors module is a good fit for this. For example, Record and Topology are structs required by the traits, and would need to have equivalents in C for any implementation. While Record seems easy to turn into a flat C struct, Topology seems much more complex. It would need to be extracted into a trait of some kind.
  2. I think the "fully dynamic plugin" design in that article is a bit overkill. Instead, I would suggest a design like Linux kernel data structures, rather than entire modules. For each operation or subsystem, there is a fixed struct with "operation pointers" to defined operations that make up an API. These point to the code scaph is interested in calling.
  3. The cdylib approach also does not cover scripting languages with a runtime. I'm imagining a system integrator writing a "monitoring daemon" for some system in Python or Java. Afterwards, they decide -- or an unaffiliated open source contributor wants -- to add support. Then they will need a C API to call into, rather than be called by.

I think it's a cool idea, but my initial impression is it's pretty difficult. Quite a bit of re-work would be needed from where things are now.

bpetit commented 3 years ago

Hi,

Thanks a lot for those insights of yours. This is very valuable.

I think the "fully dynamic plugin" design in that article is a bit overkill. Instead, I would suggest a design like Linux kernel data structures, rather than entire modules. For each operation or subsystem, there is a fixed struct with "operation pointers" to defined operations that make up an API. These point to the code scaph is interested in calling.

I just added this documentation as a lead to further investigate how to do this feature. It may be in fact overkill as you said. This FR is more intended to be like a discussion thread about the "plugin system" and open to other ideas. Thanks a lot for sharing yours !

This data structures idea is very interesting and seems like a lightweight scenario. But I'm not sure that I see the whole picture right now. For example, am I right if I say this means that we have to "register" those data structures (that they have to exist in the code), in the code of scaph to allow calling those operations ? This would be okay for Sensors plugins (the action of calling them coming from "above") but maybe not for Exporters (which trigger the initial action). Oh right, this is a thread about sensors, I'm diverging from the initial topic, sorry about that. I'm thinking out loud...

The cdylib approach also does not cover scripting languages with a runtime. I'm imagining a system integrator writing a "monitoring daemon" for some system in Python or Java. Afterwards, they decide -- or an unaffiliated open source contributor wants -- to add support. Then they will need a C API to call into, rather than be called by.

In my imagination a sensor plugin would be used to extend scaph abilities to collect energy consumption data in diverse scenarii. I imagined more that a plugin like that would be called by scaph, rather than the opposite. For more "abstraction" above scaphandre (to use it as a subsystem that allows gathering data and then doing more complex stuff above it) I was more thinking about an Exporter plugin system. There is an FR for that: https://github.com/hubblo-org/scaphandre/issues/18 (and I totally agree that allowing scripting languages is very important in that case)

Tell me if I understood your point wrong. It seems to me that there are two different cases here. One to extend scaph ability to collect data (even if there is no RAPL interface, but a physical wattmeter, or a statistical database to make estimations rather than real measurement, for example..) which would be a "sensor plugin" use case, and another to allow external programs to call scaphandre to get the metrics it collects, which would be an "exporter plugin" use case.

jhwgh1968 commented 3 years ago

This would be okay for Sensors plugins (the action of calling them coming from "above") but maybe not for Exporters (which trigger the initial action). Oh right, this is a thread about sensors, I'm diverging from the initial topic, sorry about that. I'm thinking out loud...

I agree that exporters would be much more complex to write, and a full plugin system might be worth considering. But as you noted, this was about sensors, so I thought it was overkill.

It seems to me that there are two different cases here. One to extend scaph ability to collect data [...] which would be a "sensor plugin" use case, and another to allow external programs to call scaphandre to get the metrics it collects, which would be an "exporter plugin" use case.

That is indeed what I understood, when looking at your project. What I was thinking about: even collecting data may need to call into rather than call from. Since I have written that, however, I have re-thought my use case. I think instead an RPC for sensors to send data to Scaph would be enough.

Allow me to give a detailed example of what I am thinking about: hardware appliances.

Suppose you run a datacenter. In addition to servers with VMs (which Scaph supports), you have a number of hardware video encoders that your customers buy or rent. These are 2U rack-mounted boxes, with custom software, and thousands of GPU threads in them to concurrently encode video streams in real time.

The manufacturer is focused on performance, not power consumption. Since they imagine these running at 90-98% load, they don't even display anything on their web interface or shell. However, an internet hobbyist gets one on eBay, reverse engineers it, and finds that it is logging internally.

If someone files a warranty claim, these logs let Tech Support verify the end user did not overload or overheat the system and void the warranty. Because it's not "customer facing", it's in a rather ugly debug log that occurs periodically, generated by some subsystem running on the device, and is deleted every 24 hours.

The hobbyist cannot figure out how to get the raw information themselves, so they write a "monitoring service". It watches the log file, and every time it changes, update a tally of CPU usage and power information. It is also in Python, because it's already on the device. That is what the web interface is written in.

But now, how does the hobbyist get that information to Scaph?

Currently, there is only one way: write a sensor in Rust to scrape the logs, then cross compile Scaph for the device. With the plugin system, they could write it in C or C++, but it would still require you to compile Scaph itself.

That turns out to be trickier than you might think. While I have not worked with such video encoders, I have seen several similar "appliances", with integrated hardware and software.

One of them ran a heavily-patched Linux kernel with a lot of custom hardware glue, and no temperature sensors in the usual APIs. Another ran a custom RTOS, provided a minimal custom shell, and "soldered in" Python -- partly for web pages, partly for OEM customization. That was the only programming language you had, with the full standard library, but limited system calls.

What does a Scaph user do then? I think the answer is, have the Python do RPC to Scaph running somewhere else in the data center. But perhaps that is out of scope.

bpetit commented 3 years ago

Depending on the architecture of the device's cpu, and because there are many GPU (which scaph doesn't know yet how to monitor), I'd say your suggestion is relevant:

What does a Scaph user do then? I think the answer is, have the Python do RPC to Scaph running somewhere else in the data center. But perhaps that is out of scope.

A solution might be to write an RPCSensor to get the data from the python monitoring agent. This is a bit different than the actual rapl sensor, as for now this is the exporter that sets the pace of the data collection (the sensor doesn't run on its own). Here we would need to have a process running as daemon for the RPCSensor, while still allowing the user to run scaph with an exporter to get the collected data. The topology tree would also need some extensions, depending on the depth of the data you collect from the video appliance.

I would'nt say this is out of scope, but this is a pretty advanced use case regarding the state of development of the project. Even so, I'd happily try to help or accommodate any work you'd want to initiate on that topic. :)

bpetit commented 3 years ago

Found that additional ressource, that may be interesting: https://sixtyfps.io/blog/expose-rust-library-to-other-languages.html (or not ? let's see)

TheElectronWill commented 1 year ago

How about writing dynamic plugins in Rust? Other languages could be interesting, but I'm a bit worried about the overhead of that. If all the exporters were written in Python, Scaphandre's overhead would be higher, and that's not so good.

Of course it's better to have a python exporter than no exporter at all, but if adding a new Rust exporter was really easy, we would not need to deal with other languages for that :)