grafana / mqtt-datasource

MQTT Datasource for Grafana allows streaming data from any MQTT broker running either locally or remotely.
Apache License 2.0
173 stars 51 forks source link

Closing Web Browser Causes DatasourceNoData Alert And Graph Data To Be Cleared (MQTT Datasource) #41

Open Drakynn opened 2 years ago

Drakynn commented 2 years ago

What happened:

During the process of setting up my new Grafana dashboard, any edit involving the mqtt-datasource topic reset all of the collected data. When I was first just editing the dashboard panels, I assumed this was working "as intended" during the edit/save cycle. When I started work on the alarms, I thought it was odd that this also reset the data.

I took my system live yesterday and noticed last night that if I close the browser window that I was simply watching the dashboard from, my data would reset and a DatasourceNoData alarm condition was raised. There was no editing or no saving of panels. I repeated this several times from both of my computers with different browsers. In each case, I had the same result - data wipe and alarm generated.

After about a minute, the DatasourceNoData condition is resolved and normal data collection resumes.

What you expected to happen:

I expect to be able to close the browser without resetting all of my collected data and without receiving a no-data alarm via email. I should be able to observe the visualization without destroying the data or generating false alarms.

How to reproduce it (as minimally and precisely as possible):

I initially configured this in a VM, then moved it to a Raspberry Pi4. I followed the same steps both times with identical results.

  1. Install latest Grafana via apt
  2. Install Mosqutto MQTT broker via apt
  3. Build grafana/mqtt-datasource plugin from source
  4. Start live data collection from IoT device
  5. Create simple time series panel to start graphing data
  6. Set up an alarm threshold with No Data alarm
  7. Allow some data to collect
  8. Close web browser
  9. Wait a minute for no data alarm
  10. Re-open browser and see the previously collected data is gone

Anything else we need to know?:

Other than spurious data loss and associated alarms, all systems are working as expected.

grafana.log shows the following relevant entries, most of which are just the alarm cycle. The key entry seems to be the "stop streaming" at the moment the browser closes.

logger=context t=2022-03-17T08:38:08-0400 lvl=info msg="Request Completed" method=GET path=/ status=302 remote_addr=192.168.2.10 time_ms=0 size=29 referer=
logger=http.server t=2022-03-17T08:38:24.64-0400 lvl=info msg="Successful Login" User=admin@localhost
logger=context t=2022-03-17T08:38:25.22-0400 lvl=info msg="Request Completed" method=GET path=/api/live/ws status=0 remote_addr=192.168.2.10 time_ms=2 size=0 referer=
logger=plugin.grafana-mqtt-datasource t=2022-03-17T08:39:35.39-0400 lvl=info msg="stop streaming (context canceled)"
logger=alertmanager org=1 level=debug component=dispatcher msg="Received alert" alert=DatasourceNoData[24f10d8][active]
logger=alertmanager org=1 level=debug component=dispatcher aggrGroup="{}/{scope=\"house\"}:{}" msg=flushing alerts=[DatasourceNoData[24f10d8][active]]
logger=alertmanager org=1 level=debug component=dispatcher receiver="House Alert" integration=email[0] msg="Notify success" attempts=1
logger=alertmanager org=1 level=debug component=dispatcher aggrGroup="{}/{scope=\"house\"}:{}" msg=flushing alerts=[DatasourceNoData[24f10d8][resolved]]
logger=alertmanager org=1 level=debug component=dispatcher receiver="House Alert" integration=email[0] msg="Notify success" attempts=1

Environment: As noted, this was originally set up clean a few days ago using amd64 binaries in a VM rather than the arm64 packages for the Raspberry Pi. I'm confident the hardware platform is not the issue.

ivanahuckova commented 2 years ago

I have transfered this issue from grafana repo as it seems related to this data source.

Drakynn commented 2 years ago

In pkg/plugin/datasource.go, RunStream() calls Client.Unsubscribe if ctx.Done() - which is the state when a browser disconnects from the Grafana console.

pkg/plugin/datasource.go

func (ds *MQTTDatasource) RunStream(ctx context.Context, req *backend.RunStreamRequest, sender *backend.StreamSender) error {
    ds.Client.Subscribe(req.Path)
    defer ds.Client.Unsubscribe(req.Path)    // <--  

    for {
        select {
        case <-ctx.Done():
            backend.Logger.Info("stop streaming (context canceled)")
            return nil
        case message := <-ds.Client.Stream():
            if message.Topic != req.Path {
                continue
            }
            err := ds.SendMessage(message, req, sender)
            if err != nil {
                log.DefaultLogger.Error(fmt.Sprintf("unable to send message: %s", err.Error()))
            }
        }
    }
}

pkg/mqtt/client.go

func (c *Client) Unsubscribe(t string) {
    log.DefaultLogger.Debug(fmt.Sprintf("Unsubscribing from MQTT topic: %s", t))
    c.client.Unsubscribe(t)
    c.topics.Delete(t)
}

The topic is being explicitly deleted on an unsub, and unsub is being explicitly called on a client disconnect.

I'm not sure what other effects it has, but I found that with the deferred call to Client.Unsubscribe commented out, I'm getting the behaviour I expect from Grafana.

I am able to stop viewing a panel without the data being wiped. This means it's OK to go from the dashboard to editing alarms or even close the browser window entirely without data loss.

Drakynn commented 2 years ago

I believe this also resolves issue #36 and #37 which seem to be variants of "data is lost when I close the browser".

Of course this comes with the caveat that there may be other error conditions which need to be caught separately which legitimately should unsub from the topic and delete it.

Drakynn commented 2 years ago

Have used mqtt-datasource for a couple of weeks now without the Client_Unsubscribe() on client disconnect.

I'm enjoying being able to revisit dashboards without the data wipes. I've noticed only one minor artifact.

When returning to a dashboard that has not been displayed recently, the graph builds somewhat strangely. The MQTT data seems to be gobbled up from left to right with a series of rapid screen updates, leaving only the most recent value displayed on the far right. Refreshing the browser shows a current view of the most recent data which in my case is the last 6 hours. If the browser is left open the page updates normally.

Drakynn commented 2 years ago

Here's a link to a 15 second video showing how the panels quickly seem to "eat" the old values. The video is recorded at 1X normal speed

First, the gauge rapidly bounces through all the values, then the graph seems to eat the old values, left to right, leaving only the most recent data point on the graph.

The browser is manually refreshed at the 11-second mark, which brings up the last 6 hours of data as it should have displayed on the original load.

https://www.dropbox.com/s/x9r1f656jjwj19u/mqtt-display-bug.mp4?dl=0

sanitariu commented 1 year ago

Is this fixed in latest releases ? I still have the same problem. Close browser / open again and all graphs are zero.