Closed sparvu closed 5 years ago
Let's discuss about the timestamp. Some devices are not capable of sending data, others do measure data and timestamp it themselves. What I have seen on other MQTT platforms is that they store the timestamp of the data and/or the timestamp when the data has been received. My proposal is to store both, and in case the device is not capable to send data, to assign the time of receipt to the data. You tell me please if this is a valid proposal.
About time: yes, shortly these are the main lines and we will follow them. We need to clarify first some ground concepts before jumping to time
Let's clarify what's a "client": an MQTT client is an app which connects to the MQTT broker on a topic (or multiple topics) and can publish or simply wait for other clients to publish.
So our "MQTT KBUS" is an MQTT client. It simply connects to a topic (the topic can be defined with wildcards, in which case you can say it subscribes to multiple topics) and just waits for messages from other clients.
The "MQTT KBUS" client receives from the broker, when a message is published, only the following information:
No other information is available: not the publisher's ID, nor its IP, nothing!
Now, when you say
For each MQTT client we generate a new DSID to be used on the K platform
what do you mean by "MQTT client"? You mean an instance of "MQTT BUS", or a sender? And, either way, how is the IP important?
will call shortly about these. we need to clarify
what is a data source id in this case: MQTT client to Kronometrix DSID
what is a device id
what means to see traffic from other MQTT clients
So here are some ground rules I understood from these guys which are using MQTT for some time. Might help us.
we should have clear topics, no hidden agendas or weird naming conventions. topics must be clear and have no ids, key logs etc
the payload could contain a JSON where we could use a clientid as needed by our system, in our case the databus
the payload could have a simple structure containing the the metrics, some other informations and if needed a ClientID which can be a SHA512 etc
the clients must always be authenticated and authorised . it is very important to have a broker which can do these and allow SSL for secure communication from day 0
In general the payload is the way to propagate extra information between clients if needed. SSL always must be used along with authentication
So, to resume:
on the broker - the most intelligent and powerful way would be to have the databus implemented on the broker itself, but that means we should build our own broker which is not simple and applicable right now
on the client - the next immediate thing is to make the databus outside the broker, as we are discussing now on a MQTT client which would require we ask on the body payload to have always some minimum requirements:
ClientID: "XXXX" a string which can by anything
We can't enforce a rule to say that the ClientID must be already a SHA256 some devices might not be capable to produce this
We can make the DSID within the databus on our side as simple as SHA256 or SHA512(ClientID, 'MQTT', 'SID')
But the real problems are these:
The only exception would be what happens if the ClientID is entirely missing in the body payload
Some clients might not want to change their payload format to JSON
Some clients might not want to change and add to their payload the ClientID
The most logic and powerful way to handle this is in the broker itself: in a form of a plugin or module to handle the MQTT load. This way we have access to all clients, and we can easily produce and convert traffic to Kronometrix from MQTT. But I do not know any form of Lua based MQTT broker nor any of our team members knows Erlang or is familiar with MQTT broker concepts.
On the other side: on the client we could always have some min requirements where we ask no matter of the payload type: JSON, XML, etc a ClientID string which must contain a unique string to identify the clients.
I don't think that modifying an MQTT broker would be wise; the MQTT itself is just a transport layer, it shouldn't do anything else than distributing messages between clients.
If we want to accommodate in Kronometrix various clients, let's find out about them: how do these clients communicate with each other, what protocol, what details.
From the little research I've done regarding other "analytics" platforms (or, rather, IoT platforms) that use MQTT, this is their architecture:
prefix/<client_id>/last/ta
Don't imagine that I studied dozens of cases 😃 I just did a little research on a couple of them (Wia, Elastic Search, Watson and a few others I can't remember now).
Remember the topic: we discuss about how one would capture data from n MQTT clients, convert these to Kronometrix messages for analysis.
If you have studied then you already know that MQTT is a vast topic where you can have unlimited use cases, type of payloads, clients etc. So there cannot be a single solution to drive almost all cases unless:
A. you attack the problem within the broker itself where you can for example extend the functionality by allowing the convert the messages towards our platform. You do not need to touch any MQTT functionality but you add on it for our own purpose
B. you keep the logic on yet another MQTT client and use some recommendations as already mentioned above. the payload is the most simple aspect which can turned in our favour.
C. Security is a very important topic on MQTT which requires attention from day 1.
So we just need to review and chose A, B, C and carry on with the plan.
Regarding the brokers, this I have reviewed a bit last year. Some recommendations and good alternatives were: https://vernemq.com https://github.com/emqx/emqx
The best to my findings were Emqx which is enough powerful and has a flexible monitoring part done in VueJS.
Now I understand why you mentioned Erlang :) Both these brokers are written in Erlang.
Emqx seems nice, indeed.
Regarding the A, B, C options, I already stated my opinion: no point in modifying the "transport" layer (this is the MQTT broker); it makes more sense to implement the logic in an MQTT client (option B). To ensure security, I also proposed to use our own broker installation that integrates Redis AUTH database for authenticating Kronometrix users.
We can as simple as that, select B and dive into the payload requirements. Sounds go to me.
Regarding the broker: emqx probable is the best broker out there which supports authentication over SSL. All inside. No need for anything.
So lets focus on B. Where we process and try to identify the MQTT clients using a MQTT client. So here clarifications:
MQTT clients can have set on the topic or payload a custom string which can define the client id. There are no rules and cases might be different
There can be any string format which can define the ClientID
It can show up in topic or boy payload
So based on this our databus first must have:
a simple way to configure and identify the clientid topic or payload
a string which should be used to detect what is the client id. Can be ClientID="xxx", MotherboardID="xxx-xxx-xxxx", MachineID etc We need somewhere to define the keyword from where we shall parse the clientid
when we know the clientid we compute the DSID
thats the first phase. MQTT Clientid to DSID parsing
Ok. To extract an ID from the topic we can use a regular expression. But to extract it from the payload it is more complicated, because the payload can be multiline, can have various formats, can even be binary. How do you see the configuration for this "client ID" extraction? (you know use use JSON for settings).
we can always start with topic followed by payload when we have a usecase. we need the way to differentiate under a config where we define how and from where we fetch the client id string.
On the payload, what is the hard part to search for a string and find its value ? the content, u just search body of text, one line or multi-line etc ... I dont see a problem with that.
Ok. What are the action points for this?
Build a simple configuration where we can define how we shall identify the client id.
Allow two options: topic | payload
If configured as payload, return a string error on the logs not implemented yet
For topic, define a way to parse and find the client id using a string defined under configuration file and make the DS mapping based on the simplest method: DSID = SHA256 (ClientID, MQTT, SID)
The SID will be set in the configuration too? And by MQTT, you mean the URL of the MQTT broker?
yes.
https://github.com/kronometrix/mqtt/issues/4#issue-424883616
B1. MQTT KBUS must have a way to define and configure, the following items:
the MQTT broker address, IP and port number
the MQTT authentication settings, if the MQTT broker is using authentication or SSL
the Kronometrix platform, SID, TID where the MQTT data shall be published
Done. This is the proposed configuration structure:
local kronometrix = {
{
host = "127.0.0.1",
port = 80,
path = "/api/private/send_data",
sid = "9ee583c7d0a8b314c947dccfdcd922ca", -- Computer Performance
tid = "d5e077bb7d043f5bd93391d283072e1d"
}
}
local mqtt = {
server = "37.187.106.16",
topic = "krmx/+/send_data",
client_id_source = "topic",
client_id_regexp = "krmx/(%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x)/send_data"
}
Multiple Kronometrix destinations can be defined.
To extract the client ID from the topic, you need to specify the regular expression (Lua-like) in the key client_id_regexp
The DSID is now generated using SHA256 based on the client ID extracted, MQTT server URL and the SID.
client_id_regexp = "krmx/(%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x)/send_data"
what is this ?
That's a regular expression. It means "32 hexadecimal digits". It can be any regular expression (Lua-like).
lets do like this: pls input some sample example if you plan to use technical regex expressions within configurations files. these are not selling. nobody buys %x
I have nothing against the use of regex but pls document with 3-5 samples within config ... otherwise change that in something which can be used for sales.
Are you asking me to document Lua regular expressions? :) This is the documentation page: https://www.lua.org/pil/20.2.html A couple of examples:
clientid_regexp = "prefix/(.+)/suffix" -- all characters in the topic "path" between a prefix and a suffix
clientid_regexp = "prefix/(%w+)$" -- all alphanumeric characters in the topic "path" between a prefix and the end of the string
clientid_regexp = "prefix/(%d%d%d)$" -- three digits at the end of the topic string
If you find it too complicated and "will not sell", please propose something else.
you could use some 2,3 examples, short not long, how one would understand what to look for. like here: https://gist.github.com/nerdsrescueme/1237767 - For example the first one would be looking for something as simple as ClientID="client09-machine" which will allow us to fetch client09-machine
the main idea is that the default values or whatever you keep in the config by default must look concise and simple not ugly (even if that is a legit regex construct).
So my examples above are too complex?
keep your examples followed by a practical example. use ClientID as a main example ...
Hello. Can we make some progress on this?
So after a long discussion about MQTT, here are some aspects as I see them:
As a bottom line, in my opinion:
Simple and straightforward.
To be more specific, I would propose this format:
/krmx/5556789e2b06f2018859c0bc1d93bea1/b6ac411d5960dabfb804f94577a3cd0f/my_dsid/my_dev/iaqd-g01
{
"timestamp":1554129158,
"ta":22.7,
"rh":52.7,
"td":-3.9,
"co2":632,
"voc":480
}
We could move the "message id" from the topic to the payload.
In the payload, timestamp
can be missing, in which case the current timestamp (when the message has been received) will be used.
Arguments:
This is in line with what I've seen on other platforms and it makes sense for me like this.
I consider my proposal to be a good starting point, quite easy to be adopted by various MQTT-able devices. When specific cases arrive, we might add different functionalities to the MQTT "databus". For instance, to integrate a big client, maybe we could do something more specific for them. But as a general feature set for MQTT, I find my proposal just good.
ok, some first questions:
if you have 100 MQTT clients, what would it be easier to modify 100 client configurations, or one configuration ?
how do you plan to keep track of DSID, DEVICE ids on each MQTT clients ?
if you have 100 MQTT online clients, functioning, and you add 50 new clients and remove 20, what would be easier: to have a single place where all clients meet and map to Kronometrix or for each client to manage and handle the Kronometrix identifications: DSID, DEVID, etc ?
If the 100 MQTT clients are from the same buyer, then yes, it is simpler to administer them server-side. But if the 100 MQTT clients are sold to 100 people, then we will need to make 100 configurations on the server. And we will have to make these configurations ourselves. Whereas if the configuration is on the device, we can ask the buyers to make their own configuration (of course, with proper tools we need to offer).
As regarding the TID, this is indeed a valid concern which I thought of a little bit last night.
In my opinion, we will need security end-to-end.
If we store the TID on the "databus", any client having access to the MQTT broker will be able to send data to Kronometrix. It's trivial to subscribe to some topics (using wildcards), see what's the protocol, then send whatever data to Kronometrix. In my opinion, this is not acceptable.
I think we need a way to prevent unauthorized clients (clients without a TID) to connect to the MQTT broker. This is in line with what other platforms do.
Taking a look at EMQ X, I saw it has some nice capabilities regarding ACL and authentication. We can either use authentication (via Redis or via HTTP basic auth), or we can use the ACL to prevent the clients to "sniff" on other client's tokens, then validate the token when they publish.
Look the big picture. You have 100 MQTT clients, one buyer, 30 whatever buyers etc
If the 100 MQTT clients are from the same buyer, then yes, it is simpler to administer them server-side. But if the 100 MQTT clients are sold to 100 people, then we will need to make 100 configurations on the server. And we will
It is hard to make the modifications in 100 places, no matter these 100 devices, clients come from 1 buyer or 100. You literally have to make 100 modifications to change something which anyway has nothing to do with MQTT. Is not logic.
Instead you can allow your MQTT client, part of the databus product to subscribe to different topics and handle that in a single place nice and easy.
Even in your example, having 100 buyers will increase the risk substantial, that something might get broken when you want your 100 users to change the configs.
Dont you agree ?
Having to manually provision every new "buyer" isn't a good idea, in my opinion. The buyer will have to make his own account on Kronometrix; why shouldn't he provision his own devices?
I did not ask about buyer, provisioning, I just ask: what do you think it is easier to handle 100 modifications or only one ? To me the answer is obvious. One. And thats on the databus itself. Dont you agree ?
Yes, it's easier to handle one modification. But it's easier to handle zero modifications (and to let the owner of the devices to make these modifications).
ok, we are coming to some consensus. So yes, it is much easier to have in a single place the config for 2 or 100 MQTT clients or 10.000. Now our goal is to establish common grounds on what we are building. A MQTT Databus. Thats what we are after.
I will list here again the top considerations of what is a databus and how to do it.
includes a MQTT client which can subscribe to different topics, on a certain broker
the databus might offer a broker, but this wont be anytime soon now, or calendar 2019. Maybe 2020 we can integrate a broker after somebody is financing this activity. This is a separate track.
when data arrives from one or many MQTT clients, our Kronometrix MQTT Databus MQTT client should fetch the content from other MQTT clients and based on that should pass the content, payload(s) it to the databus itself for Kronometrix conversion and transport towards platform analytics
the platform analytics will not speak MQTT. Nor DDS nor anything else than HTTP 1.0, and future HTTP 2.0. In fact HTTP 2.0 will be a very important part of future analytics for our platform.
so the databus is the bridge between MQTT world and Kronometrix world
to minimize and have the less possible changes on the MQTT front, clients, etc our databus must offer several capabilities:
to map MQTT clients to Kronometrix DSIDs
could group or configure them to certain subscriptions if needed
map MQTT payloads to Kronometrix data messages
provision data
of course if required, the databus could in fact offer a REST API interface that allows this, offering support to configure via Web the MQTT clients to different Kronometrix subscriptions, etc
Further clarifications about TID
this is a very important element which must be kept at all costs secured and private
its place must be on the databus, because there it is the most secure place
we cant place the TID on the MQTT client, nor allow end users, or users to configure on their devices from two aspects: simple management, and security
we cannot guarantee that all MQTT solutions will always use a secure communication
we, will not be able to offer our broker anytime soon, 2019
the databus is responsible and manages:
MQTT communication, using a MQTT client, fetching the MQTT messages
Kronometrix internal operations: authentication and authorisation, DSID, the data message and provisioning
Therefore I would suggest that the TID as the most other information should sit be manageable on the databus itself. Again if the management is a concern then we can allow and offer a REST API for that.
So lets review, summarise and conclude what options and path we take and why.
I still don't understand how will you prevent anybody to send data to Kronometrix via the MQTT broker?
We will have a max DSID or MQTT clients allowed on the databus. An option which will allow us to say no more than 200 MQTT clients are allowed. These 200 MQTT clients will then be mapped to K DSIDs and processed.
You can think on your time, how we can implement this max control. The Databus must display its configuration and on the logs during the start the number allowed of clients. We can re-use the platform.json as a form of 'licnsing' or whatever else you want to call it where we configure the max clients. I can tonight formalize the databus.json config.
then we need a crypto way to ensure this limit .
MQTT traffic cannot reach Kronometrix without a databus. And a databus has a capacity and a cost. Like everything else in life.
Let me know if you still have unclear things. We take them one by one. Some I cant answer how we do it technical but high level design I have the concepts I would love we close the discussion and debates quickly to move to the low level design and implementation.
And what stops a maleficent user to use another client's ID to send bogus data?
For example, I subscribe to the same broker to all topics. I see what's happening there, then I use an existing client ID and send bogus data. Or even flood. Or I create new bogus clients so the real clients will be rejected (due to the max clients limit).
Basically, you have no protection against "bad people" on MQTT. Anyone can render that subscription unusable, if he intends so.
ok, some clarifications:
we are not here to fix MQTT. We need to work with it
we need to protect our databus at all costs (max number allowed at one time, maybe some other criteria in place as protection: like clients allowed on the databus based on certain pattern, ids, etc )
the databus must have very clear configuration(s) and a capacity set at start. that capacity should act as a upper limit allowing MQTT clients to be mapped and processed through databus
and, I hope we understood what we are planning to make:
a MQTT client which subscribes to a MQTT broker
your questions and concerns are more if we run on a broker which does not support authentication, then yes we might receive more traffic from more clients
if the broker supports authentication and SSL our client must support to join too to such thing
I hope these answers u questions. Let me know if you still have unclear things .
Sure. Let's proceed as you see fit for the project.
So what's the next step on this one?
Core Concepts
Kronometrix MQTT Databus (MQTT KBUS)
A1. Kronometrix SID, TID, DSID, DEVID are all MQTT KBUS concepts. They have nothing to do with MQTT nor be part of any MQTT topics
A2. We should keep apart MQTT communication and not force to change, create or alter MQTT topics based on SID, TID, DEVICEID etc. which are Kronometrix internal concepts.
A3. MQTT KBUS must deploy its own version of the MQTT client, preferable async non-blocking capable to submit to one or many topics to a MQTT broker
A4. The MQTT KBUS must be capable to receive via MQTT other client's topics as soon as the other clients are publishing something on the TCP line
A5. The MQTT KBUS could theoretical publish some topics too or at this stage it simple just receive data from the other clients, when available.
MQTT to Kronometrix Data Mapping
B1. MQTT KBUS must have a way to define and configure, the following items:
the MQTT broker address, IP and port number
the MQTT authentication settings, if the MQTT broker is using authentication or SSL
the Kronometrix platform, SID, TID where the MQTT data shall be published
B2. The Kronometrix DSID, DEVICEID must be detected from the MQTT clients or overwritten by the MQTT KBUS itself. There can be the following cases
For each MQTT client we generate a new DSID to be used on the K platform . This is the default mode
For each MQTT client, we can ask the IP via MQTT and based on that compute the DSID