jmpsec / osctrl

Fast and efficient osquery management
https://osctrl.net
MIT License
384 stars 51 forks source link

Run distributed query/carve based on custom tags #529

Open zhuoyuan-liu opened 2 weeks ago

zhuoyuan-liu commented 2 weeks ago

We just explored osctrl-admin and found that we can add a custom tag to each node/device. However, after added the custom tag, we cannot run a distributed query based on this tag. It would be great help if can also run the query based on these tags.

I would like to contribute to this feature, but I would like to know more details about the implementation.

In the architecture definition, the osctrl-admin should only talk to osctrl-api instead of the database directly. However, I found osctrl-admin would interact with the DB directly in many cases. I am completely fine with implementation and want to make sure if the rest of the changes are allowed to do so. image

From the source code, I can see that currently it's based on four types of tags: env, platform, UUID and localname. I guess the easiest solution is to add an extra field so that we can pass the custom tags. What do you think?

javuto commented 2 weeks ago

This is something that I had planned to implement since I added tags, not only for distributed queries but for file carves as well (they are technically a type of distributed query), see https://github.com/jmpsec/osctrl/issues/76 and https://github.com/jmpsec/osctrl/issues/77 I see two different implementations that can be done:

  1. Add a new field for tags to the existing implementation - It will be faster to implement but it will contribute to potential performance issues involving the backend.
  2. Reimplement completely how distributed queries work - It will take longer but no more potential backend performance issues.
zhuoyuan-liu commented 1 week ago

Hi @javuto , I have the following idea with Redis:

I think it's enough for us, but if you want to actively track how many nodes are unfished, we can create another Redis set to maintain a list of unfinished nodes for each query or just query logs returned by nodes.

Benefits: