wymangr / blueiris_exporter

Prometheus Exporter for Blue Iris
MIT License
14 stars 2 forks source link

[BUG] blueiris_exporter crashes when BI log file contains many move/delete log entries #8

Closed war4peace closed 1 week ago

war4peace commented 4 weeks ago

Describe the bug

I have particularly large Blue iris log files which retain all file delete/move actions. For example, my August 2024 log file was up to date 173 MB large. blueiris_exporter crashes every time I start it with "concurrent map writes" error. After removing hundreds of thousands of lines containing "Delete:" and "Move:" words and reducing the log file to manageable levels, the application (and service, subsequentially) started without errors. I am not sure whether this was caused by attempting to read the whole file at once at start or its size only, however this month's log file size is currently 23 MB and I have unchecked logging of file move/deletes.

EDIT: I have just noticed the Grafana "Parse Errors" panel contains a bunch of "Delete" entries which I probably missed. If Delete lines are not parsed properly, this might cause the observed crash.

To Reproduce Steps to reproduce the behavior:

  1. You need a large Blue Iris log file
  2. Start the exporter as usual
  3. See error.

Expected behavior Resilience for large log files

Screenshots Not Applicable

Desktop (please complete the following information):

Additional context This is not a request for fix, but merely informational. The user-facing fix most likely consists in not logging file moves/deletes in Blue Iris, however it might be a (low-priority) good idea to add code in the application which would avoid crashing if such a large log file is encountered.

Thank you for creating this great application!

war4peace commented 4 weeks ago

Log excerpt with lines that result in a parse error: (Github formats them in a weird way, the "0" character is at the beginning of the line)

0 8/16/2024 5:11:00.353 PM Alerts Delete: over quota 100.0/100.0GB, 866.1GB free 0 8/16/2024 5:11:00.555 PM Aux 2 Delete: over quota 48/48 hrs, 28.0/100.0GB, 866.1GB free 0 8/16/2024 5:11:00.673 PM Aux 3 Delete: over quota 48/48 hrs, 24.5/100.0GB, 866.1GB free 0 8/16/2024 5:11:00.701 PM Aux 4 Delete: over quota 48/48 hrs, 28.0/100.0GB, 866.2GB free 0 8/16/2024 5:11:00.898 PM Aux 5 Delete: over quota 48/48 hrs, 20.8/100.0GB, 866.3GB free 0 8/16/2024 5:11:00.900 PM Aux 6 Delete: over quota 48/48 hrs, 4.88/100.0GB, 866.3GB free 0 8/16/2024 5:11:01.057 PM Aux 7 Delete: over quota 48/48 hrs, 4.35/100.0GB, 866.3GB free

wymangr commented 3 weeks ago

Hey @war4peace thanks for reporting! If you don't mind sharing one of your (larger) configuration files, it would help in debugging.

war4peace commented 3 weeks ago

Sure thing! https://drive.google.com/file/d/11anNADQUbDz-utZIn6tI6J2ZdfxK5K1Y/view?usp=sharing This is for June 2024. The log for July 2024 is even larger, it contains a large amount of entries, here's a link to that one as well. https://drive.google.com/file/d/1YPNhiC1xvWhSX4-kVo-vOvHfi-NoFJdL/view?usp=sharing

In the meantime I have disabled logging for most file deletions, but some file deletion entries remain, there is no exact setting in Blue Iris to disable those.

Please let me know if you need more info, data, or anything else from me.

Thank you very much!

war4peace commented 3 weeks ago

One more thing I should add: It seems that the application starts crashing continuously once the log size exceeds 32 MB (32768 KB) - I might be wrong on this, because I haven't paid much attention until a few minutes ago, when I saw the app crashing continuously again, and I removed some more "Delete" entries from the log, and now it doesn't crash anymore. But it's worth checking, because it's August 20th and my log file is 29763 KB large - just a couple more days and it will exceed 32 MB even with trimming everything.

wymangr commented 3 weeks ago

@war4peace Thanks for the data, it's really helpful. I think there are a few little bugs causing some issues that I'm going to work on and look into a way to limit how much of the log file is parsed.

war4peace commented 3 weeks ago

Thank you so much!

wymangr commented 3 weeks ago

@war4peace I'm currently testing a potential fix. If all goes well, I'll release today or tomorrow.

war4peace commented 3 weeks ago

Excellent! Please let me know and I will test it. Will there be a new Grafana dashboard as well? I am using the Prometheus one, in case it matters.

wymangr commented 3 weeks ago

Nope, these changes shouldn't impact the dashboard at all. I added a new flag --logoffset that will offset the log file before reading it. I defaulted it to the last 10mb, so it will only read the last 10mb of the log file instead of the whole thing. You will be able to adjust the size by passing in the flag.

I also squashed a couple bugs :D

war4peace commented 3 weeks ago

Thank you again. Would the "Delete: " lines be properly parsed now? In 18 hours, the dashboard has amassed 1.62K "Log Parse Errors". In the end, I guess it doesn't matter much, if the exporter will only read the most recent X MB from the log.

wymangr commented 3 weeks ago

Yup, the parse errors were one of the bugs I fixed.

wymangr commented 3 weeks ago

@war4peace my tests were successful. I've created a new release . Let me know if it fixes the issues you were having.

war4peace commented 3 weeks ago

I have downloaded and installed the exporter as a service again, Startup was fine, no errors, and I see no errors in Event Viewer for the time being. Prometheus server receives data and Grafana dashboard is also populated. If there are still any issues, I will let you know, but for now everything looks good.

Thank you very much once again for the quick response and fix. It's a wonderful solution for Blue Iris-related statistics!

war4peace commented 3 weeks ago

I think I found a bug. In the Blue Iris Grafana dashboard, the count of alerts increases geometrically. At the very beginning of the analysis, the counts were in the thousands, which was expected due to initial log analysis. Now they're nearing a million for some cameras, which is clearly not right. Two screenshots taken at close timestamps (basically one update)

image At the next update, here's the new counts: image I think the counts increase by the total count of the whole log parsed, rather than just the specified period.

Below, screenshot for "last 5 minutes" filter: image

wymangr commented 3 weeks ago

Thanks, I'll take a look. Feel free to play around with the dashboard and let me know if you get it working correctly.

wymangr commented 3 weeks ago

Hmm, interesting. It looks like my dashboard is not showing the same behavior.

Last 6 hours: image

Last 1 hour: image

Can you edit the AI Count panel and confirm the prometheus query is: increase(blueiris_ai_count{camera=~"$camera", type=~"$type"}[$__range])

And if it's the same, maybe try taking a look at the following queries in explore? increase(blueiris_ai_count[$__range]) blueiris_ai_count

war4peace commented 3 weeks ago

Original query: increase(blueiris_ai_count{camera=~"$camera", type=~"$type"}[$__range]) +1 Result: image

1st recommended query: increase(blueiris_ai_count{camera=~"$camera", type=~"$type"}[$__range]) Result: image

2nd recommended query: increase(blueiris_ai_count[$__range]) Result: image

3rd recommended query: blueiris_ai_count Result: image

It looks like the exporter is sending the whole 10 MB of log every update. The service, as well as the Prometheus server were both restarted, with no behavior change.

war4peace commented 3 weeks ago

One more thing: I switched to "Last 5 minutes" in Grafana and refreshed until a new alert came in, the counts jumped by the total amount of alerts for the whole log period for that camera.

wymangr commented 3 weeks ago

Thanks for the data, I'll keep investigating. You are running the exporter in a windows service? If so, could you try stopping the service, opening a command prompt and running it through there to see if there are any warning/logs that are making it to the console that aren't making it to the event viewer?

wymangr commented 3 weeks ago

Also, try adding --logoffset 0 to disable that feature. See if it's that that's causing the problems.

war4peace commented 3 weeks ago

I have done as you instructed. I'll wait a bit and let Prometheus and Grafana process data until legacy information clears up.

wymangr commented 3 weeks ago

I did just released another change. It looks like BI changed the log for the canceled alerts from "cancelled" to "canceled" which impacted the "type" label. I'm not sure if that was part of the issue or not, but you can give that a try also. Another thing you can try to do is accessing the metrics endpoint manually (http://localhost:2112/metrics) and refresh a few times to see what data you get?

With the blueiris_ai_count metric, what it should do is count all the AI alerts from the log. Then the AI Count panel in Grafana counts the increase in that metric. So if there are 100 alerts at the beginning of the range then at the end of the range there are 103, it should throw up the difference (3).

What might be happening is because your log grows so quickly, when you use the --logoffset, every time it counts, it's starting at a new place in the log and dropping off values as it's adding them causing some weird behavior in the grafana dashboard. Another thing you can try is increasing the --logoffset to around 32 (right before you were seeing it crash). I'll also try to come up with some other ideas to how I can better handle large files.

war4peace commented 3 weeks ago

It looks like the --logoffset 0 option works well. Accessing the metrics directly doesn't work. From the same host I get "connection refused", and from a different host I get "ERR_CONNECTION_TIMED_OUT". Plus, the application crashed when I tried to access metrics directly. Error output is in the next message.

But with --logoffset 0 it seems there are no issues with the data in Grafana anymore.

war4peace commented 3 weeks ago

Error after accessing the metrics directly:

`fatal error: concurrent map writes

goroutine 2978 [running]: runtime.throw({0x7e85e8?, 0xc00020af50?}) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/panic.go:992 +0x76 fp=0xc0005e8c60 sp=0xc0005e8c30 pc=0x349d16 runtime.mapassign_faststr(0x766300?, 0xc00002a0c0?, {0x7e0de8, 0x6}) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/map_faststr.go:212 +0x39c fp=0xc0005e8cc8 sp=0xc0005e8c60 pc=0x3238dc github.com/wymangr/blueiris_exporter/blueiris.findObject({0xc00020af50, 0x4c}) /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris/blueirisMetrics.go:348 +0x42a5 fp=0xc0005e91b8 sp=0xc0005e8cc8 pc=0x70a785 github.com/wymangr/blueiris_exporter/blueiris.BlueIris(0xc000541f80?, {0xc00021ce00, 0x2, {0x7e2cc3, 0xb}, 0x1, {0xc000246000, 0x16, 0x16}, 0x81cb70, ...}, ...) /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris/blueirisMetrics.go:111 +0x617 fp=0xc0005e9da0 sp=0xc0005e91b8 pc=0x702377 main.CollectMetrics(0x0?, 0x353405?, {0xc00021ce00, 0x2, {0x7e2cc3, 0xb}, 0x1, {0xc000246000, 0x16, 0x16}, ...}, ...) /home/runner/work/blueiris_exporter/blueiris_exporter/metrics.go:115 +0x34b fp=0xc0005e9f30 sp=0xc0005e9da0 pc=0x70df6b main.(ExporterBlueIris).Collect.func1() /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris_exporter.go:40 +0x89 fp=0xc0005e9fe0 sp=0xc0005e9f30 pc=0x70bc29 runtime.goexit() /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0005e9fe8 sp=0xc0005e9fe0 pc=0x377001 created by main.(ExporterBlueIris).Collect /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris_exporter.go:40 +0x2aa

goroutine 1 [IO wait]: internal/poll.runtime_pollWait(0x1ea5d6c4050, 0x72) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0x23?, 0xc0006078d8?, 0x0) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.execIO(0xc000228f18, 0xc0006078e0) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_windows.go:175 +0xe5 internal/poll.(FD).acceptOne(0xc000228f00, 0x2dc, {0xc00026e000?, 0x3be03f?, 0x0?}, 0x0?) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_windows.go:942 +0x6d internal/poll.(FD).Accept(0xc000228f00, 0xc000607ab8) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_windows.go:976 +0x1d6 net.(netFD).accept(0xc000228f00) /opt/hostedtoolcache/go/1.18.8/x64/src/net/fd_windows.go:139 +0x65 net.(TCPListener).accept(0xc0002076e0) /opt/hostedtoolcache/go/1.18.8/x64/src/net/tcpsock_posix.go:139 +0x28 net.(TCPListener).Accept(0xc0002076e0) /opt/hostedtoolcache/go/1.18.8/x64/src/net/tcpsock.go:288 +0x3d net/http.(Server).Serve(0xc000268000, {0x8a61e8, 0xc0002076e0}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:3039 +0x385 net/http.(Server).ListenAndServe(0xc000268000) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:2968 +0x7d net/http.ListenAndServe(...) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:3222 main.start({0xc0000a80aa, 0xf}, {0x7e144f, 0x8}, {0x7e077f, 0x5}, {0xb5c8e0?, 0x0?}) /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris_exporter.go:80 +0x51e main.main() /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris_exporter.go:183 +0x1369

goroutine 117 [select]: github.com/prometheus/client_golang/prometheus.(Registry).Gather(0xc00021e280) /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:520 +0x9a5 github.com/prometheus/client_golang/prometheus.(noTransactionGatherer).Gather(0x8?) /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:1042 +0x22 github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1({0x8a6398, 0xc00031a0e0}, 0xc00022d500) /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/promhttp/http.go:135 +0xfe net/http.HandlerFunc.ServeHTTP(0x1ea5d6c3f60?, {0x8a6398?, 0xc00031a0e0?}, 0x31d925?) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:2084 +0x2f net/http.(ServeMux).ServeHTTP(0x0?, {0x8a6398, 0xc00031a0e0}, 0xc00022d500) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:2462 +0x149 net/http.serverHandler.ServeHTTP({0xc000586090?}, {0x8a6398, 0xc00031a0e0}, 0xc00022d500) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:2916 +0x43b net/http.(conn).serve(0xc0000a2000, {0x8a68b0, 0xc000219ad0}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:1966 +0x5d7 created by net/http.(*Server).Serve /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:3071 +0x4db

goroutine 2912 [semacquire]: sync.runtime_Semacquire(0x0?) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/sema.go:56 +0x25 sync.(WaitGroup).Wait(0xc0002c3f90?) /opt/hostedtoolcache/go/1.18.8/x64/src/sync/waitgroup.go:136 +0x52 github.com/prometheus/client_golang/prometheus.(Registry).Gather.func2() /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:470 +0x2f created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:469 +0x5e5

goroutine 2911 [semacquire]: sync.runtime_Semacquire(0xc00024c028?) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/sema.go:56 +0x25 sync.(WaitGroup).Wait(0xc0002cbdf0?) /opt/hostedtoolcache/go/1.18.8/x64/src/sync/waitgroup.go:136 +0x52 main.(ExporterBlueIris).Collect(0xc0002263e0, 0xc0004e3aa0) /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris_exporter.go:44 +0x2b9 github.com/prometheus/client_golang/prometheus.(Registry).Gather.func1() /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:453 +0xfb created by github.com/prometheus/client_golang/prometheus.(Registry).Gather /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:464 +0x565

goroutine 2979 [IO wait]: internal/poll.runtime_pollWait(0x1ea5d6c3d80, 0x72) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0x4?, 0x1ea5d74850a?, 0x0) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.execIO(0xc00064a018, 0x81db10) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_windows.go:175 +0xe5 internal/poll.(FD).Read(0xc00064a000, {0xc0004c1000, 0x1000, 0x1000}) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_windows.go:441 +0x25f net.(netFD).Read(0xc00064a000, {0xc0004c1000?, 0xc00017d6a0?, 0x5886c5?}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/fd_posix.go:55 +0x29 net.(conn).Read(0xc00019c020, {0xc0004c1000?, 0x1?, 0x89fec4?}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/net.go:183 +0x45 net/http.(connReader).Read(0xc0006681b0, {0xc0004c1000, 0x1000, 0x1000}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:780 +0x16d bufio.(Reader).fill(0xc0004e3140) /opt/hostedtoolcache/go/1.18.8/x64/src/bufio/bufio.go:106 +0x103 bufio.(Reader).ReadSlice(0xc0004e3140, 0x5?) /opt/hostedtoolcache/go/1.18.8/x64/src/bufio/bufio.go:371 +0x2f bufio.(Reader).ReadLine(0xc0004e3140) /opt/hostedtoolcache/go/1.18.8/x64/src/bufio/bufio.go:400 +0x27 net/textproto.(Reader).readLineSlice(0xc0005860c0) /opt/hostedtoolcache/go/1.18.8/x64/src/net/textproto/reader.go:57 +0x99 net/textproto.(Reader).ReadLine(...) /opt/hostedtoolcache/go/1.18.8/x64/src/net/textproto/reader.go:38 net/http.readRequest(0xc00019c020?) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/request.go:1029 +0x79 net/http.(conn).readRequest(0xc000510140, {0x8a6808, 0xc00050a0c0}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:988 +0x24a net/http.(conn).serve(0xc000510140, {0x8a68b0, 0xc000219ad0}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:1891 +0x32b created by net/http.(*Server).Serve /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:3071 +0x4db

goroutine 2974 [IO wait]: internal/poll.runtime_pollWait(0x1ea5d6c3e70, 0x72) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0xc00021cd90?, 0xc0005406c0?, 0x0) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.execIO(0xc0002dc518, 0x81db10) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_windows.go:175 +0xe5 internal/poll.(FD).Read(0xc0002dc500, {0xc000586041, 0x1, 0x1}) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_windows.go:441 +0x25f net.(netFD).Read(0xc0002dc500, {0xc000586041?, 0x7e2cc3?, 0xb?}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/fd_posix.go:55 +0x29 net.(conn).Read(0xc0000ce5b8, {0xc000586041?, 0xc00020c0a8?, 0xc00021cd90?}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/net.go:183 +0x45 net/http.(connReader).backgroundRead(0xc000586030) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:672 +0x3f created by net/http.(connReader).startBackgroundRead /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:668 +0xca

goroutine 2916 [runnable]: regexp.(inputString).step(0xc0005b0080?, 0x29?) /opt/hostedtoolcache/go/1.18.8/x64/src/regexp/regexp.go:389 +0x8d regexp.(Regexp).tryBacktrack(0xc0000a3860, 0xc0005b0000, {0x8a7758?, 0xc0005b0080}, 0x217c28?, 0xc000217b90?) /opt/hostedtoolcache/go/1.18.8/x64/src/regexp/backtrack.go:209 +0x9f3 regexp.(Regexp).backtrack(0xc0000a3860, {0x0, 0x0, 0x0}, {0xc0000cc4b0, 0x2e}, 0x0, 0xc000382c40?, {0xc000382c90, 0x0, ...}) /opt/hostedtoolcache/go/1.18.8/x64/src/regexp/backtrack.go:353 +0x325 regexp.(Regexp).doExecute(0xc000439490?, {0x0?, 0x0}, {0x0, 0x0, 0x0}, {0xc0000cc4b0, 0x2e}, 0x749b20?, 0x18, ...) /opt/hostedtoolcache/go/1.18.8/x64/src/regexp/exec.go:535 +0x272 regexp.(Regexp).FindStringSubmatch(0xc0000a3860, {0xc0000cc4b0, 0x2e}) /opt/hostedtoolcache/go/1.18.8/x64/src/regexp/regexp.go:1043 +0x8f github.com/wymangr/blueiris_exporter/blueiris.findObject({0xc000243880, 0x3b}) /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris/blueirisMetrics.go:372 +0x76c github.com/wymangr/blueiris_exporter/blueiris.BlueIris(0xc0004e3aa0?, {0xc00021ce00, 0x2, {0x7e2cc3, 0xb}, 0x1, {0xc000246000, 0x16, 0x16}, 0x81cb70, ...}, ...) /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris/blueirisMetrics.go:111 +0x617 main.CollectMetrics(0x0?, 0x353405?, {0xc00021ce00, 0x2, {0x7e2cc3, 0xb}, 0x1, {0xc000246000, 0x16, 0x16}, ...}, ...) /home/runner/work/blueiris_exporter/blueiris_exporter/metrics.go:115 +0x34b created by main.(ExporterBlueIris).Collect /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris_exporter.go:40 +0x2aa

goroutine 2910 [IO wait]: internal/poll.runtime_pollWait(0x1ea5d6c3f60, 0x72) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/netpoll.go:302 +0x89 internal/poll.(pollDesc).wait(0x3159dd?, 0xc0004c9f64?, 0x0) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_poll_runtime.go:83 +0x32 internal/poll.execIO(0xc0000d0c98, 0x81db10) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_windows.go:175 +0xe5 internal/poll.(FD).Read(0xc0000d0c80, {0xc0005860a1, 0x1, 0x1}) /opt/hostedtoolcache/go/1.18.8/x64/src/internal/poll/fd_windows.go:441 +0x25f net.(netFD).Read(0xc0000d0c80, {0xc0005860a1?, 0xc00022e498?, 0x0?}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/fd_posix.go:55 +0x29 net.(conn).Read(0xc00020c010, {0xc0005860a1?, 0xc0000aa110?, 0x0?}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/net.go:183 +0x45 net/http.(connReader).backgroundRead(0xc000586090) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:672 +0x3f created by net/http.(connReader).startBackgroundRead /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:668 +0xca

goroutine 2909 [select]: github.com/prometheus/client_golang/prometheus.(Registry).Gather(0xc00021e280) /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:520 +0x9a5 github.com/prometheus/client_golang/prometheus.(noTransactionGatherer).Gather(0x8?) /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:1042 +0x22 github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1({0x8a6398, 0xc00031a000}, 0xc0005f4300) /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/promhttp/http.go:135 +0xfe net/http.HandlerFunc.ServeHTTP(0x1ea5d6c3e70?, {0x8a6398?, 0xc00031a000?}, 0x31d925?) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:2084 +0x2f net/http.(ServeMux).ServeHTTP(0x0?, {0x8a6398, 0xc00031a000}, 0xc0005f4300) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:2462 +0x149 net/http.serverHandler.ServeHTTP({0xc000586030?}, {0x8a6398, 0xc00031a000}, 0xc0005f4300) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:2916 +0x43b net/http.(conn).serve(0xc000656be0, {0x8a68b0, 0xc000219ad0}) /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:1966 +0x5d7 created by net/http.(*Server).Serve /opt/hostedtoolcache/go/1.18.8/x64/src/net/http/server.go:3071 +0x4db

goroutine 2976 [semacquire]: sync.runtime_Semacquire(0x0?) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/sema.go:56 +0x25 sync.(WaitGroup).Wait(0xc000109b00?) /opt/hostedtoolcache/go/1.18.8/x64/src/sync/waitgroup.go:136 +0x52 github.com/prometheus/client_golang/prometheus.(Registry).Gather.func2() /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:470 +0x2f created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:469 +0x5e5

goroutine 2977 [semacquire]: sync.runtime_Semacquire(0xc00024c3c0?) /opt/hostedtoolcache/go/1.18.8/x64/src/runtime/sema.go:56 +0x25 sync.(WaitGroup).Wait(0xc0004e9df0?) /opt/hostedtoolcache/go/1.18.8/x64/src/sync/waitgroup.go:136 +0x52 main.(ExporterBlueIris).Collect(0xc0002263e0, 0xc000541f80) /home/runner/work/blueiris_exporter/blueiris_exporter/blueiris_exporter.go:44 +0x2b9 github.com/prometheus/client_golang/prometheus.(Registry).Gather.func1() /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:453 +0xfb created by github.com/prometheus/client_golang/prometheus.(Registry).Gather /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.13.1/prometheus/registry.go:545 +0xbab`

wymangr commented 3 weeks ago

I guess that makes some sense, --logoffset 0 basically disables the offset, so it's back to the way it was before the update. I'll keep playing around and hopefully I'll be able to duplicate it and/or come up with a fix.

war4peace commented 3 weeks ago

Thank you so much! So far the results are promising with no offset. I will later run the application as a service and report back if I find any issues or experience crashes.

wymangr commented 3 weeks ago

I was able to duplicate it by setting my offset lower then my logfile size. What I think is going on is, the blueiris_ai_count metric was counting the total number in the whole file (in my case 11469 for my front door camera). Then with the update, it's only counting the number in the bottom x mb of the file (in my case, I set to 1mb and it only had 499 for my front door camera). It looks like for some reason the increase range function is going up, whenever the count goes down as seen here:

image

I'll keep playing around with it and let you know when I figure it out.

war4peace commented 3 weeks ago

Thank you, once again!

wymangr commented 3 weeks ago

So, I found out what causes your original crash issue. It's happening when the exporter is getting called a 2nd time when it's not done completing the first call and with the large log file, it takes longer to execute, so if Prometheus is configured to call it every 5 seconds, but it takes 10 seconds to execute, the 2nd time it tries to scrape the endpoint, it crashes. It's the reason when you tried calling the endpoint manually it crashed, because it was in the process of executing for Prometheus at the same time.

That all being said, I'm working on some more fixes for it and will update when I have some fixes.

war4peace commented 3 weeks ago

I might have spoken too soon about stability... Some more testing and troubleshooting from my side: The log file size is 29430 KB right now. Last night, I saw the application crashed again. I turned Prometheus off, and tried starting the application using --logoffset 0. The application started and did not crash. Then, I changed Prometheus data scarping period to 15 seconds and started Prometheus. As soon as Prometheus starts, the application crashes immediately with "concurent map writes" error. I can provide the entire error message stack if you need it. The crash can be replicated (happens every time I start the application, then Prometheus). I then stopped Prometheus again and started the application with --logoffset 1. After starting Prometheus, the application no longer crashes, it remains turned on and stable, but values in Grafana dashboard for AI counts pile up again, as reported earlier. Scraping the endpoint takes around 100 milliseconds, as reported by Prometheus.

I can now access the Exporter endpoint, I will add data from it below. Another test I performed was refreshing the exporter URL faster and faster, and I was able to crash the application if I refreshed fast enough (around twice per second, with Prometheus open) - but that's not likely to become a systemic issue since Prometheus now scrapes every 15 seconds.

My guess is the multiplication in AI alert counts is caused by the blueiris_ai_count section, which contains values from the last 1 MB of log, and these values are added in the dashboard every time the count is increased by one. For example, "Gate - Alert" value was at one point 234, and after one more alert it increased to 465 and so on. Hope this helps troubleshooting the root cause.

# HELP blueiris_ai_count Count of Blue Iris AI analysis
# TYPE blueiris_ai_count gauge
blueiris_ai_count{camera="Court180",type="alert"} 57
blueiris_ai_count{camera="Court180",type="canceled"} 1
blueiris_ai_count{camera="Garaj",type="alert"} 60
blueiris_ai_count{camera="Garaj",type="canceled"} 22
blueiris_ai_count{camera="Gate",type="alert"} 221
blueiris_ai_count{camera="Gate",type="canceled"} 727
blueiris_ai_count{camera="Roof",type="canceled"} 465
blueiris_ai_count{camera="Street4K",type="alert"} 458
blueiris_ai_count{camera="Street4K",type="canceled"} 321
# HELP blueiris_ai_duration Duration of Blue Iris AI analysis
# TYPE blueiris_ai_duration gauge
blueiris_ai_duration{camera="Court180",detail="87",object="person",type="alert"} 377
blueiris_ai_duration{camera="Court180",detail="nothing found",object="canceled",type="canceled"} 288
blueiris_ai_duration{camera="Garaj",detail="93",object="person",type="alert"} 164
blueiris_ai_duration{camera="Garaj",detail="occupied",object="canceled",type="canceled"} 484
blueiris_ai_duration{camera="Gate",detail="83",object="car",type="alert"} 377
blueiris_ai_duration{camera="Gate",detail="nothing found",object="canceled",type="canceled"} 763
blueiris_ai_duration{camera="Roof",detail="nothing found",object="canceled",type="canceled"} 287
blueiris_ai_duration{camera="Street4K",detail="90",object="car",type="alert"} 588
blueiris_ai_duration{camera="Street4K",detail="nothing found",object="canceled",type="canceled"} 656
# HELP blueiris_ai_error Count of AI error log lines
# TYPE blueiris_ai_error gauge
blueiris_ai_error 0
# HELP blueiris_ai_notresponding Count of AI not responding errors in current logfile
# TYPE blueiris_ai_notresponding gauge
blueiris_ai_notresponding 6
# HELP blueiris_ai_restarted Times BlueIris restarted Deepstack
# TYPE blueiris_ai_restarted gauge
blueiris_ai_restarted 0
# HELP blueiris_ai_servererror Count of AI server not responding errors in current logfile
# TYPE blueiris_ai_servererror gauge
blueiris_ai_servererror 0
# HELP blueiris_ai_started Count of AI has been started log lines
# TYPE blueiris_ai_started gauge
blueiris_ai_started 0
# HELP blueiris_ai_starting Count of AI is being started log lines
# TYPE blueiris_ai_starting gauge
blueiris_ai_starting 0
# HELP blueiris_ai_timeout Count of AI timeouts in current logfile
# TYPE blueiris_ai_timeout gauge
blueiris_ai_timeout 0
# HELP blueiris_camera_status Status of each camera. 0=up, 1=down
# TYPE blueiris_camera_status gauge
blueiris_camera_status{camera="Court180",detail="object"} 0
blueiris_camera_status{camera="Garaj",detail="object"} 0
blueiris_camera_status{camera="Gate",detail="object"} 0
blueiris_camera_status{camera="Roof",detail="trigger"} 0
blueiris_camera_status{camera="Street4K",detail="object"} 0
# HELP blueiris_collector_duration_seconds Collector time duration.
# TYPE blueiris_collector_duration_seconds gauge
blueiris_collector_duration_seconds{collector="BlueIris"} 0.1008115
# HELP blueiris_exporter_errors blueiris_exporter errors
# TYPE blueiris_exporter_errors counter
blueiris_exporter_errors{function="BlueIris"} 0
# HELP blueiris_folder_disk_free Free space of the disk the folder is using in bytes
# TYPE blueiris_folder_disk_free gauge
blueiris_folder_disk_free{folder="Alerts"} 8.704e+11
blueiris_folder_disk_free{folder="Aux 2"} 8.704e+11
blueiris_folder_disk_free{folder="Aux 3"} 8.705e+11
blueiris_folder_disk_free{folder="Aux 4"} 8.705e+11
blueiris_folder_disk_free{folder="Aux 5"} 8.706e+11
blueiris_folder_disk_free{folder="Aux 6"} 8.706e+11
blueiris_folder_disk_free{folder="Aux 7"} 8.706e+11
blueiris_folder_disk_free{folder="Stored"} 8.858e+11
# HELP blueiris_folder_used Percentage of folder bytes used based on limit
# TYPE blueiris_folder_used gauge
blueiris_folder_used{folder="Alerts"} 100
blueiris_folder_used{folder="Aux 2"} 27.3
blueiris_folder_used{folder="Aux 3"} 23.5
blueiris_folder_used{folder="Aux 4"} 29.099999999999998
blueiris_folder_used{folder="Aux 5"} 20
blueiris_folder_used{folder="Aux 6"} 4.84
blueiris_folder_used{folder="Aux 7"} 4.2299999999999995
blueiris_folder_used{folder="Stored"} 100
# HELP blueiris_hours_used Percentage of folder hours used based on limit
# TYPE blueiris_hours_used gauge
blueiris_hours_used{folder="Alerts"} 0
blueiris_hours_used{folder="Aux 2"} 100
blueiris_hours_used{folder="Aux 3"} 100
blueiris_hours_used{folder="Aux 4"} 100
blueiris_hours_used{folder="Aux 5"} 100
blueiris_hours_used{folder="Aux 6"} 100
blueiris_hours_used{folder="Aux 7"} 100
blueiris_hours_used{folder="Stored"} 0
# HELP blueiris_logerror Count of unique errors in the logs
# TYPE blueiris_logerror gauge
blueiris_logerror{error=""} 0
# HELP blueiris_logerror_total Count all errors in the logs
# TYPE blueiris_logerror_total gauge
blueiris_logerror_total 0
# HELP blueiris_logwarning Count of unique warnings in the logs
# TYPE blueiris_logwarning gauge
blueiris_logwarning{warning=""} 0
# HELP blueiris_logwarning_total Count all warnings in the logs
# TYPE blueiris_logwarning_total gauge
blueiris_logwarning_total 0
# HELP blueiris_parse_errors Count of unique errors parsing log lines
# TYPE blueiris_parse_errors gauge
blueiris_parse_errors{line=""} 0
# HELP blueiris_parse_errors_total Count of all the errors parsing log lines
# TYPE blueiris_parse_errors_total gauge
blueiris_parse_errors_total 0
# HELP blueiris_profile Count of activation of profiles
# TYPE blueiris_profile gauge
blueiris_profile{profile="Night"} 1
# HELP blueiris_triggers Count of triggers
# TYPE blueiris_triggers gauge
blueiris_triggers{camera="Court180"} 30
blueiris_triggers{camera="Garaj"} 44
blueiris_triggers{camera="Gate"} 947
blueiris_triggers{camera="Roof"} 787
blueiris_triggers{camera="Street4K"} 613
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0.001543
go_gc_duration_seconds_sum 0.0557541
go_gc_duration_seconds_count 1628
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 12
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.18.8"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.376136e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 5.445008272e+09
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 6708
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 3.7370541e+07
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 3.180512e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.376136e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 9.601024e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 2.392064e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 15145
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.22592e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 1.1993088e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.7244052310918489e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 3.7385686e+07
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 9344
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16352
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 137360
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 212160
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.194304e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.856188e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 589824
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 589824
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.7854832e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 12
wymangr commented 3 weeks ago

Thanks, that's really helpful information! I've fixed the root cause of the problem (concurrent map writes) which was crashing the application if it was called while it was executing. I've also done a couple more things that should help speed things up. 1st I set it to do an initial logfile read on startup. 2nd I set it up to keep track of the last log line that it read and start calculating metrics after that last line read.

What this should do is perform the 1 long read at startup, store the counters in memory and get the last line. Then when prometheus scrapes it, it will be much quicker because it doesn't need to read the whole file, just the latest from the last time it was read and because the counters are stored in memory now, it just increments the counter in memory and doesn't need to calculate via the whole file.

Doing this makes the --logoffset obsolete, so I'll be removing that as well. I'm working threw it and testing now. I'll let you know when it's ready for you to test. Thanks for being patient with me on this :)

war4peace commented 3 weeks ago

No worries, I am glad we are successfully collaborating, it's an awesome solution for my needs, take your time.

wymangr commented 2 weeks ago

blueiris_exporter-amd64.zip

Here is the version I'm currently testing if you want to give it a try. If all goes well over the weekend, I'll release it.

I've also made a few minor tweaks to the dashboard here: grafana_dashboard.json

war4peace commented 2 weeks ago

I didn't have time yesterday, but this morning I downloaded and started the new release, as well as the new Grafana dashboard.

I will continue to monitor and provide feedback as dashboard gets populated with data.

Thank you once more for your work!

war4peace commented 2 weeks ago

I found one inconsistency in the Grafana dashboard, unfortunately. The counts for AI, for any given period, are multiplied for some reason.

Here's a screenshot below which shows the behavior. As you can notice, there is one canceled alert, however the counts are different (19 entries versus one).

image

In the next screenshot, there are three entries in the table on the right, but the counts and the entries below are different, again. AI Analysis graph and table are consistent with each other, however AI counts and duration average graph are consistent between them, but inconsistent with the other two sections mentioned above.

image

I think this is caused by the same log line being interpreted repeatedly, and in some cases a sum is being used instead of a count?

war4peace commented 2 weeks ago

A third screenshot, for the last 15 minutes. 14 table entries, 14 dots in the AI Analysis graph. Around 100 entries in the other two sections.

image

wymangr commented 2 weeks ago

Could you try this grafana dashboard?

grafana_dashboard.json

war4peace commented 2 weeks ago

Ah, this looks much better, I think we're spot on now. As usual, I will continue to monitor and report back as various combinations of alerts and occurrences happen. Thank you!

wymangr commented 2 weeks ago

Ahh, I see why the old query (with blueiris_ai_count) was not what we expected. If you are using multiple AI models to scan the alerts:

image

And both of those models detect an object, then Blue Iris logs both.

0   8/26/2024 10:39:22.547 AM   DW                      AI: [ipcam-general] person:94% [497,17 565,235] 364ms
0   8/26/2024 10:39:22.549 AM   DW                      AI: [ipcam-combined] person:91% [496,19 565,236] 364ms

So, the exporter is doing what it was designed to do for that metric and count the number of those logs. I think I'm going to leave it like that (for now) as I'm now using the blueiris_ai_duration_distinct metric for the "AI Count" panel, and that metric isn't being used anywhere else. It might be useful to know how many times AI successfully/failed to detect an object by model.

war4peace commented 2 weeks ago

I think that's automatic, since I am not selecting any specific model. image In CodeProject.AI, there are indeed two models appearing, but I haven't chosen them specifically. Furthermore, the License Plate model should never yield any result (license plates are just not big enough in the images for it to be successful). image

wymangr commented 2 weeks ago

I just took another look into your log file. It looks like Blue Iris also adds duplicated log lines for each object found. So if there is motion and AI finds multiple people/cars, then you will get multiple results.

0   7/3/2024 7:13:49.374 PM Gate                    AI: [Objects] car:95% [1055,5 2076,472] 207ms
0   7/3/2024 7:13:49.374 PM Gate                    AI: [Objects] person:80% [1519,138 1641,248] 207ms
0   7/3/2024 7:13:49.374 PM Gate                    AI: [Objects] car:67% [1258,0 1979,112] 207ms

0   7/3/2024 7:15:49.368 PM Street4K                AI: [Objects] person:88% [559,497 765,1103] 1773ms
0   7/3/2024 7:15:49.368 PM Street4K                AI: [ipcam-general] person:78% [568,494 771,1108] 1773ms

I wonder if that metric would be better as an object count metric where ipcam-general and Objects are a label.

war4peace commented 2 weeks ago

Ah, there is Pandora's box being open... or a can of worms, in a more modern setting. Looking at the dashboard (so... going backwards), what I would like to see is a count of cars and people for a given period of time. Let's take the table below as a simplified example. image

There are three cars being detected during that period (it's 30 minutes, if that matters). I would like to see a dashboard graph where the total count of cars / people is displayed. Something like "during the last $PERIOD, there were 3 cars and 0 people detected". Detection time (xxx ms) and confidence levels are nice to have for debugging, but not the primary goal as far as monitoring goes.

For example, if I wake up one morning and my neighbour says "there was a shady person last night, around 2 AM, walking up and down on the street, but I'm not sure for how long he was around", I would like to be able to pick a period, say, between midnight and 3 AM and see how many people were detected during that period of time, then filter by "people only" and get all the timestamps where a person was detected, so that I could go into Blue Iris and filter properly by camera.

I can check manually in Blue Iris, look at footage, but if I need statistics (how many times a person was detected during those three hours), the dashboard would be awesome to provide that information.

wymangr commented 2 weeks ago

I think we can get close to that with the metrics we have by adding a object variable. See this updated dashboard and let me know what you think.

grafana_dashboard.json

war4peace commented 2 weeks ago

I am a newbie at Grafana panels... but I am learning, slowly. I would like to have something similar to this, where "AI duration" is replaced with a count for the given period, for example "Gate - alert - vehicle" should show "15" instead of "286.000 ms". Does that make sense?

image

wymangr commented 2 weeks ago

I think what you are looking for is the same as the "AI Count" panel, except the AI Count panel is a Stat visualization and you are using Gauge. All you should need to do is change the sum to count. Also, if you want it to filter on object as well, you can add object in the count by.

count by (camera, type, object) (blueiris_ai_duration_distinct{camera=~"$camera", type=~"$type", object=~"$object"})

wymangr commented 2 weeks ago

You can also adjust the dropdown option for "Object", select "person", then drag a block of time across the "AI Analysis" panel that you want to look in, which will adjust the "AI Count" panel, showing you the count of people found in that time range.

There is also an option you can add on the "AI Analysis" panel to add the count in the legend.

image

image

wymangr commented 2 weeks ago

I haven't seen any issues with the new version of the exporter. I've released the changes:

https://github.com/wymangr/blueiris_exporter/releases/tag/v1.3.2

Let me know if all is good on your end and I'll close out this bug.

war4peace commented 1 week ago

Apologies, I was awfully busy. Yes, everything is working smoothly now, no crashes or any issues whatsoever.

Thank you very much for your hard work! Appreciate it!

wymangr commented 1 week ago

No problem. If you have any other proemetheus or grafana questions, feel free to reach out on Discord (wymangr) and I'll try and help.