kronometrix / mqtt

Kronometrix MQTT Databus
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

mqtt to kronometrix concepts #4

Closed sparvu closed 5 years ago

sparvu commented 5 years ago

Core Concepts

Kronometrix MQTT Databus (MQTT KBUS)

A1. Kronometrix SID, TID, DSID, DEVID are all MQTT KBUS concepts. They have nothing to do with MQTT nor be part of any MQTT topics

A2. We should keep apart MQTT communication and not force to change, create or alter MQTT topics based on SID, TID, DEVICEID etc. which are Kronometrix internal concepts.

A3. MQTT KBUS must deploy its own version of the MQTT client, preferable async non-blocking capable to submit to one or many topics to a MQTT broker

A4. The MQTT KBUS must be capable to receive via MQTT other client's topics as soon as the other clients are publishing something on the TCP line

A5. The MQTT KBUS could theoretical publish some topics too or at this stage it simple just receive data from the other clients, when available.

MQTT to Kronometrix Data Mapping

B1. MQTT KBUS must have a way to define and configure, the following items:

B2. The Kronometrix DSID, DEVICEID must be detected from the MQTT clients or overwritten by the MQTT KBUS itself. There can be the following cases

DTPopa commented 5 years ago

Let's discuss about the timestamp. Some devices are not capable of sending data, others do measure data and timestamp it themselves. What I have seen on other MQTT platforms is that they store the timestamp of the data and/or the timestamp when the data has been received. My proposal is to store both, and in case the device is not capable to send data, to assign the time of receipt to the data. You tell me please if this is a valid proposal.

sparvu commented 5 years ago

About time: yes, shortly these are the main lines and we will follow them. We need to clarify first some ground concepts before jumping to time

irimiab commented 5 years ago

Let's clarify what's a "client": an MQTT client is an app which connects to the MQTT broker on a topic (or multiple topics) and can publish or simply wait for other clients to publish.

So our "MQTT KBUS" is an MQTT client. It simply connects to a topic (the topic can be defined with wildcards, in which case you can say it subscribes to multiple topics) and just waits for messages from other clients.

The "MQTT KBUS" client receives from the broker, when a message is published, only the following information:

No other information is available: not the publisher's ID, nor its IP, nothing!

Now, when you say

For each MQTT client we generate a new DSID to be used on the K platform

what do you mean by "MQTT client"? You mean an instance of "MQTT BUS", or a sender? And, either way, how is the IP important?

sparvu commented 5 years ago

will call shortly about these. we need to clarify

sparvu commented 5 years ago

So here are some ground rules I understood from these guys which are using MQTT for some time. Might help us.

In general the payload is the way to propagate extra information between clients if needed. SSL always must be used along with authentication

sparvu commented 5 years ago

So, to resume:

But the real problems are these:

sparvu commented 5 years ago

The most logic and powerful way to handle this is in the broker itself: in a form of a plugin or module to handle the MQTT load. This way we have access to all clients, and we can easily produce and convert traffic to Kronometrix from MQTT. But I do not know any form of Lua based MQTT broker nor any of our team members knows Erlang or is familiar with MQTT broker concepts.

On the other side: on the client we could always have some min requirements where we ask no matter of the payload type: JSON, XML, etc a ClientID string which must contain a unique string to identify the clients.

irimiab commented 5 years ago

I don't think that modifying an MQTT broker would be wise; the MQTT itself is just a transport layer, it shouldn't do anything else than distributing messages between clients.

If we want to accommodate in Kronometrix various clients, let's find out about them: how do these clients communicate with each other, what protocol, what details.

From the little research I've done regarding other "analytics" platforms (or, rather, IoT platforms) that use MQTT, this is their architecture:

Don't imagine that I studied dozens of cases 😃 I just did a little research on a couple of them (Wia, Elastic Search, Watson and a few others I can't remember now).

sparvu commented 5 years ago

Remember the topic: we discuss about how one would capture data from n MQTT clients, convert these to Kronometrix messages for analysis.

If you have studied then you already know that MQTT is a vast topic where you can have unlimited use cases, type of payloads, clients etc. So there cannot be a single solution to drive almost all cases unless:

A. you attack the problem within the broker itself where you can for example extend the functionality by allowing the convert the messages towards our platform. You do not need to touch any MQTT functionality but you add on it for our own purpose

B. you keep the logic on yet another MQTT client and use some recommendations as already mentioned above. the payload is the most simple aspect which can turned in our favour.

C. Security is a very important topic on MQTT which requires attention from day 1.

So we just need to review and chose A, B, C and carry on with the plan.

sparvu commented 5 years ago

Regarding the brokers, this I have reviewed a bit last year. Some recommendations and good alternatives were: https://vernemq.com https://github.com/emqx/emqx

The best to my findings were Emqx which is enough powerful and has a flexible monitoring part done in VueJS.

irimiab commented 5 years ago

Now I understand why you mentioned Erlang :) Both these brokers are written in Erlang.

Emqx seems nice, indeed.

Regarding the A, B, C options, I already stated my opinion: no point in modifying the "transport" layer (this is the MQTT broker); it makes more sense to implement the logic in an MQTT client (option B). To ensure security, I also proposed to use our own broker installation that integrates Redis AUTH database for authenticating Kronometrix users.

sparvu commented 5 years ago

We can as simple as that, select B and dive into the payload requirements. Sounds go to me.

Regarding the broker: emqx probable is the best broker out there which supports authentication over SSL. All inside. No need for anything.

sparvu commented 5 years ago

So lets focus on B. Where we process and try to identify the MQTT clients using a MQTT client. So here clarifications:

So based on this our databus first must have:

thats the first phase. MQTT Clientid to DSID parsing

irimiab commented 5 years ago

Ok. To extract an ID from the topic we can use a regular expression. But to extract it from the payload it is more complicated, because the payload can be multiline, can have various formats, can even be binary. How do you see the configuration for this "client ID" extraction? (you know use use JSON for settings).

sparvu commented 5 years ago

we can always start with topic followed by payload when we have a usecase. we need the way to differentiate under a config where we define how and from where we fetch the client id string.

On the payload, what is the hard part to search for a string and find its value ? the content, u just search body of text, one line or multi-line etc ... I dont see a problem with that.

irimiab commented 5 years ago

Ok. What are the action points for this?

sparvu commented 5 years ago
  1. Build a simple configuration where we can define how we shall identify the client id.

  2. Allow two options: topic | payload

  3. If configured as payload, return a string error on the logs not implemented yet

  4. For topic, define a way to parse and find the client id using a string defined under configuration file and make the DS mapping based on the simplest method: DSID = SHA256 (ClientID, MQTT, SID)

irimiab commented 5 years ago

The SID will be set in the configuration too? And by MQTT, you mean the URL of the MQTT broker?

sparvu commented 5 years ago

yes.

https://github.com/kronometrix/mqtt/issues/4#issue-424883616

MQTT to Kronometrix Data Mapping

B1. MQTT KBUS must have a way to define and configure, the following items:

irimiab commented 5 years ago

Done. This is the proposed configuration structure:

local kronometrix = {
    {
        host = "127.0.0.1",
        port = 80,
        path = "/api/private/send_data",
        sid = "9ee583c7d0a8b314c947dccfdcd922ca", -- Computer Performance
        tid = "d5e077bb7d043f5bd93391d283072e1d"
    }
}

local mqtt = {
    server = "37.187.106.16",
    topic = "krmx/+/send_data",
    client_id_source = "topic",
    client_id_regexp = "krmx/(%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x)/send_data"
}

Multiple Kronometrix destinations can be defined. To extract the client ID from the topic, you need to specify the regular expression (Lua-like) in the key client_id_regexp The DSID is now generated using SHA256 based on the client ID extracted, MQTT server URL and the SID.

sparvu commented 5 years ago

client_id_regexp = "krmx/(%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x)/send_data"

what is this ?

irimiab commented 5 years ago

That's a regular expression. It means "32 hexadecimal digits". It can be any regular expression (Lua-like).

sparvu commented 5 years ago

lets do like this: pls input some sample example if you plan to use technical regex expressions within configurations files. these are not selling. nobody buys %x

I have nothing against the use of regex but pls document with 3-5 samples within config ... otherwise change that in something which can be used for sales.

irimiab commented 5 years ago

Are you asking me to document Lua regular expressions? :) This is the documentation page: https://www.lua.org/pil/20.2.html A couple of examples:

clientid_regexp = "prefix/(.+)/suffix" -- all characters in the topic "path" between a prefix and a suffix
clientid_regexp = "prefix/(%w+)$" -- all alphanumeric characters in the topic "path" between a prefix and the end of the string
clientid_regexp = "prefix/(%d%d%d)$" -- three digits at the end of the topic string

If you find it too complicated and "will not sell", please propose something else.

sparvu commented 5 years ago

you could use some 2,3 examples, short not long, how one would understand what to look for. like here: https://gist.github.com/nerdsrescueme/1237767 - For example the first one would be looking for something as simple as ClientID="client09-machine" which will allow us to fetch client09-machine

the main idea is that the default values or whatever you keep in the config by default must look concise and simple not ugly (even if that is a legit regex construct).

irimiab commented 5 years ago

So my examples above are too complex?

sparvu commented 5 years ago

keep your examples followed by a practical example. use ClientID as a main example ...

irimiab commented 5 years ago

Hello. Can we make some progress on this?

irimiab commented 5 years ago

So after a long discussion about MQTT, here are some aspects as I see them:

As a bottom line, in my opinion:

Simple and straightforward.

irimiab commented 5 years ago

To be more specific, I would propose this format:

We could move the "message id" from the topic to the payload.

In the payload, timestamp can be missing, in which case the current timestamp (when the message has been received) will be used.

Arguments:

This is in line with what I've seen on other platforms and it makes sense for me like this.

irimiab commented 5 years ago

I consider my proposal to be a good starting point, quite easy to be adopted by various MQTT-able devices. When specific cases arrive, we might add different functionalities to the MQTT "databus". For instance, to integrate a big client, maybe we could do something more specific for them. But as a general feature set for MQTT, I find my proposal just good.

sparvu commented 5 years ago

ok, some first questions:

irimiab commented 5 years ago

If the 100 MQTT clients are from the same buyer, then yes, it is simpler to administer them server-side. But if the 100 MQTT clients are sold to 100 people, then we will need to make 100 configurations on the server. And we will have to make these configurations ourselves. Whereas if the configuration is on the device, we can ask the buyers to make their own configuration (of course, with proper tools we need to offer).

As regarding the TID, this is indeed a valid concern which I thought of a little bit last night.

Security

In my opinion, we will need security end-to-end.

If we store the TID on the "databus", any client having access to the MQTT broker will be able to send data to Kronometrix. It's trivial to subscribe to some topics (using wildcards), see what's the protocol, then send whatever data to Kronometrix. In my opinion, this is not acceptable.

I think we need a way to prevent unauthorized clients (clients without a TID) to connect to the MQTT broker. This is in line with what other platforms do.

irimiab commented 5 years ago

Taking a look at EMQ X, I saw it has some nice capabilities regarding ACL and authentication. We can either use authentication (via Redis or via HTTP basic auth), or we can use the ACL to prevent the clients to "sniff" on other client's tokens, then validate the token when they publish.

sparvu commented 5 years ago

Look the big picture. You have 100 MQTT clients, one buyer, 30 whatever buyers etc

If the 100 MQTT clients are from the same buyer, then yes, it is simpler to administer them server-side. But if the 100 MQTT clients are sold to 100 people, then we will need to make 100 configurations on the server. And we will

It is hard to make the modifications in 100 places, no matter these 100 devices, clients come from 1 buyer or 100. You literally have to make 100 modifications to change something which anyway has nothing to do with MQTT. Is not logic.

Instead you can allow your MQTT client, part of the databus product to subscribe to different topics and handle that in a single place nice and easy.

Even in your example, having 100 buyers will increase the risk substantial, that something might get broken when you want your 100 users to change the configs.

Dont you agree ?

irimiab commented 5 years ago

Having to manually provision every new "buyer" isn't a good idea, in my opinion. The buyer will have to make his own account on Kronometrix; why shouldn't he provision his own devices?

sparvu commented 5 years ago

I did not ask about buyer, provisioning, I just ask: what do you think it is easier to handle 100 modifications or only one ? To me the answer is obvious. One. And thats on the databus itself. Dont you agree ?

irimiab commented 5 years ago

Yes, it's easier to handle one modification. But it's easier to handle zero modifications (and to let the owner of the devices to make these modifications).

sparvu commented 5 years ago

ok, we are coming to some consensus. So yes, it is much easier to have in a single place the config for 2 or 100 MQTT clients or 10.000. Now our goal is to establish common grounds on what we are building. A MQTT Databus. Thats what we are after.

I will list here again the top considerations of what is a databus and how to do it.

sparvu commented 5 years ago

Further clarifications about TID

Therefore I would suggest that the TID as the most other information should sit be manageable on the databus itself. Again if the management is a concern then we can allow and offer a REST API for that.

sparvu commented 5 years ago

So lets review, summarise and conclude what options and path we take and why.

irimiab commented 5 years ago

I still don't understand how will you prevent anybody to send data to Kronometrix via the MQTT broker?

sparvu commented 5 years ago

We will have a max DSID or MQTT clients allowed on the databus. An option which will allow us to say no more than 200 MQTT clients are allowed. These 200 MQTT clients will then be mapped to K DSIDs and processed.

You can think on your time, how we can implement this max control. The Databus must display its configuration and on the logs during the start the number allowed of clients. We can re-use the platform.json as a form of 'licnsing' or whatever else you want to call it where we configure the max clients. I can tonight formalize the databus.json config.

then we need a crypto way to ensure this limit .

sparvu commented 5 years ago

MQTT traffic cannot reach Kronometrix without a databus. And a databus has a capacity and a cost. Like everything else in life.

sparvu commented 5 years ago

Let me know if you still have unclear things. We take them one by one. Some I cant answer how we do it technical but high level design I have the concepts I would love we close the discussion and debates quickly to move to the low level design and implementation.

irimiab commented 5 years ago

And what stops a maleficent user to use another client's ID to send bogus data?

For example, I subscribe to the same broker to all topics. I see what's happening there, then I use an existing client ID and send bogus data. Or even flood. Or I create new bogus clients so the real clients will be rejected (due to the max clients limit).

Basically, you have no protection against "bad people" on MQTT. Anyone can render that subscription unusable, if he intends so.

sparvu commented 5 years ago

ok, some clarifications:

sparvu commented 5 years ago

and, I hope we understood what we are planning to make:

I hope these answers u questions. Let me know if you still have unclear things .

irimiab commented 5 years ago

Sure. Let's proceed as you see fit for the project.

irimiab commented 5 years ago

So what's the next step on this one?