opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.09k stars 701 forks source link

[Bug] 22.1 Insight menu graphs are not updating / re-rendering on some filtering changes #5579

Closed jaxjexjox closed 2 years ago

jaxjexjox commented 2 years ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

As discussed here : https://forum.opnsense.org/index.php?topic=26887.0;topicseen

Changing date range, interface, ticking reverse lookup results in graphs that used to update fine in 21, no do in 22. 3 different browsers, same results.

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...' Reporting menu / insights
  2. Click on '....' Change to 30 / 60 / days etc Change interface to LAN Tick reverse lookup Note the graph isn't updating the pie graph.

Expected behavior Working graphs.

Happy to provide more information if necessary and or video footage. Screenshots are included here: https://forum.opnsense.org/index.php?topic=26887.0;topicseen

AdSchellevis commented 2 years ago

Best check on the machine itself if reverse lookups work from the console, the chart returns addresses when the local lookup wasn't possible (try host <ipaddress> on the console). The most common problem is related to local resolve issues (on my end resolving seems to work without issues by the way in the pie chart)

jaxjexjox commented 2 years ago

Note, to be clear, this issue will occur, even without ticking reverse lookup. The data (not packets) graph is particularly bad. It will regularly not update with only a few clicks adjusting time duration / interface.

jaxjexjox commented 2 years ago

Ok I just wanted to be certain I wasn't leading you astray, it definitely occurs, even without touching reverse lookup.

AdSchellevis commented 2 years ago

I don't think anything changed between 21.7 and 22.1 in this area, you can always inspect the browser console to see if there are issues reported, they might provide some insights on what's going on.

kulikov-a commented 2 years ago

Hi once I caught an error in d3js that stopped the js work on the page due to possible data corruption in one of the date periods. unfortunately I immediately reset the data (Reporting: Settings) and could no longer repeat the error

jaxjexjox commented 2 years ago

I don't think anything changed between 21.7 and 22.1 in this area, you can always inspect the browser console to see if there are issues reported, they might provide some insights on what's going on.

I have updated to 22.1.1 and the problem persists. I am unsure, how to inspect browser console data (ok I kinda have a rough idea but wouldn't know where to look)

Hi once I caught an error in d3js that stopped the js work on the page due to possible data corruption in one of the date periods. unfortunately I immediately reset the data (Reporting: Settings) and could no longer repeat the error

Do you have any suggestions how I could fix this? Is it possible to extract ALL the data in its entirety and pop it into excel? I mean this may be a very minor but genuine bug that should be fixed up.

UPDATE as I type this. I firmly believe @kulikov-a is on to something. I've had this bug for a couple of weeks now, and if I do 2 hours, 1 day, 7 days, 14 days it works flawlessly. 30 days - bam it breaks. I mean look I can delete my data but this would be frustrating as one of the key reasons I've switched to this wonderful software is better visibility and trends of my usage.

I am willing to share my data if it can be pulled and we can identify the issue.

fichtner commented 2 years ago

Just open dev console in your favourite browser to see JS error please.

jaxjexjox commented 2 years ago

Just open dev console in your favourite browser to see JS error please.

Based on @kulikov-a and my testing, it seems very clear there's corrupt data, somewhere in excess of 14 days ago. Can I remove a row from the database? Should we investigate to try and solve this?

jaxjexjox commented 2 years ago

Well let me know, I'm happy to export my CSVs to anyone trusted here and provide a debug report. I'm positive @kulikov-a is correct based on the graphs being flawless under 30 days. There's clearly bunk data.

If I can provide all my exported CSVs and a debug log and it helps you

A, identify the issue with the data & B, stop this occurring for others in future

Then I'd be happy to help. Understand it's a low priority kind of job, but something has written data poorly here. (or recorded / measured incorrectly)

jaxjexjox commented 2 years ago

Just open dev console in your favourite browser to see JS error please.

Got it actually thank you

d3.min.js?v=d7a544c33409d20d:4 Uncaught TypeError: Cannot read properties of undefined (reading '1') at n (d3.min.js?v=d7a544c33409d20d:4:26038) at SVGGElement.<anonymous> (nv.d3.min.js?v=d7a544c33409d20d:10:6414) at d3.min.js?v=d7a544c33409d20d:5:13340 at Y (d3.min.js?v=d7a544c33409d20d:1:4505) at Array.Yl.each (d3.min.js?v=d7a544c33409d20d:5:13304) at Array.b (nv.d3.min.js?v=d7a544c33409d20d:10:5994) at Array.Co.call (d3.min.js?v=d7a544c33409d20d:3:15178) at SVGSVGElement.<anonymous> (nv.d3.min.js?v=d7a544c33409d20d:10:15563) at d3.min.js?v=d7a544c33409d20d:3:15103 at Y (d3.min.js?v=d7a544c33409d20d:1:4505) n @ d3.min.js?v=d7a544c33409d20d:4 (anonymous) @ nv.d3.min.js?v=d7a544c33409d20d:10 (anonymous) @ d3.min.js?v=d7a544c33409d20d:5 Y @ d3.min.js?v=d7a544c33409d20d:1 Yl.each @ d3.min.js?v=d7a544c33409d20d:5 b @ nv.d3.min.js?v=d7a544c33409d20d:10 Co.call @ d3.min.js?v=d7a544c33409d20d:3 (anonymous) @ nv.d3.min.js?v=d7a544c33409d20d:10 (anonymous) @ d3.min.js?v=d7a544c33409d20d:3 Y @ d3.min.js?v=d7a544c33409d20d:1 Co.each @ d3.min.js?v=d7a544c33409d20d:3 b @ nv.d3.min.js?v=d7a544c33409d20d:10 Co.call @ d3.min.js?v=d7a544c33409d20d:3 (anonymous) @ networkinsight:1401 c @ nv.d3.min.js?v=d7a544c33409d20d:3 setTimeout (async) c @ nv.d3.min.js?v=d7a544c33409d20d:3 setTimeout (async) a.render @ nv.d3.min.js?v=d7a544c33409d20d:3 a.addGraph @ nv.d3.min.js?v=d7a544c33409d20d:3 (anonymous) @ networkinsight:1353 each @ jquery-3.5.1.min.js:2 (anonymous) @ networkinsight:1346 complete @ opnsense.js?v=d7a544c33409d20d:242 c @ jquery-3.5.1.min.js:2 fireWith @ jquery-3.5.1.min.js:2 l @ jquery-3.5.1.min.js:2 (anonymous) @ jquery-3.5.1.min.js:2 load (async) send @ jquery-3.5.1.min.js:2 ajax @ jquery-3.5.1.min.js:2 ajaxGet @ opnsense.js?v=d7a544c33409d20d:234 chart_interface_totals @ networkinsight:1345 (anonymous) @ networkinsight:1729 dispatch @ jquery-3.5.1.min.js:2 v.handle @ jquery-3.5.1.min.js:2 F.fn.triggerNative @ bootstrap-select.min.js?v=d7a544c33409d20d:8 (anonymous) @ bootstrap-select.min.js?v=d7a544c33409d20d:8 dispatch @ jquery-3.5.1.min.js:2 v.handle @ jquery-3.5.1.min.js:2

https://i.imgur.com/XIN3CNr.png https://i.imgur.com/XIN3CNr.png <- screenshot That error ONLY pops, when switching from 14 to 30 day.

kulikov-a commented 2 years ago

Looks like nvd3 requires equal data arrays length for stacked graphs. If data length for some interface is shorter then nvd3 will throw type error. I would just add zeros in such cases.

jaxjexjox commented 2 years ago

Hi Guys,

Some friends have helped me with this issue and I think we've found the cause - (if possible) that would be nice.

Alternatively, could I hand fix my data?

To re-state, the fault ONLY KICKS IN if graph date, exceeds, 14 days. https://i.imgur.com/C5VXAk5.png These all work.

After 546,000 rows, only one of them, exhibits this issue here: https://i.imgur.com/bnsiukG.png

So I would love to either delete those 2 rows, or in the very least, have the developers try to stop that occurring for others, if they have any idea how it occurred.

You will note this was the day I updated from 21.7 to 22.1

jaxjexjox commented 2 years ago

Looks like nvd3 requires equal data arrays length for stacked graphs. If data length for some interface is shorter then nvd3 will throw type error. I would just add zeros in such cases.

See my reply above, thanks to you, I've investigated deeper, likely even found the faulted rows in the database, no idea if I can edit the DB - possibly caused by Opnsense upgrade process. Thanks for help

jaxjexjox commented 2 years ago

I can confirm that the feature "Repair Netflow data" does not fix the damaged data and the problem persists.

kulikov-a commented 2 years ago

@jaxjexjox can you try with opnsense-patch -a kulikov-a 3df5b54 please? https://github.com/kulikov-a/core/commit/3df5b54f4184e07a510ff926eaae00b1d78fb7de still think that the point is the possible absence of some data about the traffic of one of the interfaces in the selected range (interface not present or down?)

jaxjexjox commented 2 years ago

MANY hours work with friends later.

sqlite3 src_addr_details_086400.sqlite

I have

`sqlite> Select * from timeserie where packets is '0' ...> ; 2021-12-26 00:00:00|1640476800|em0|in|192.168.0.6|172.67.75.68|443|6|0|0 2021-12-26 00:00:00|1640476800|pppoe0|out|172.67.75.68|192.168.0.6|443|6|0|0 2021-12-26 00:00:00|1640476800|pppoe0|in|172.67.75.68|192.168.0.6|443|6|0|0 2021-12-26 00:00:00|1640476800|em0|out|192.168.0.6|172.67.75.68|443|6|0|0

sqlite> Delete from timeserie where packets is '0' `

(Plus another batch earlier) I have deleted ALL ROWS which are broken (total was actually 6)

https://i.imgur.com/lz6RI18.png

I have rebooted. Sadly, while the bad data is gone, my graphs will not update beyond the 14 day limit :( I even export, again - to confirm "bad rows" are missing.

jaxjexjox commented 2 years ago

opnsense-patch -a kulikov-a 3df5b54

`root@OPNsense:/var/netflow # opnsense-patch -a kulikov-a 3df5b54 Fetched 3df5b54 via https://github.com/kulikov-a/core Hmm... Looks like a unified diff to me... The text leading up to this was:

From 3df5b54f4184e07a510ff926eaae00b1d78fb7de Mon Sep 17 00:00:00 2001 From: kulikov-a 36099472+kulikov-a@users.noreply.github.com Date: Thu, 17 Feb 2022 11:16:24 +0300 Subject: [PATCH] nvd3 stacked graphs missed data fix
---
.../OPNsense/Diagnostics/networkinsight.volt 23 +++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/src/opnsense/mvc/app/views/OPNsense/Diagnostics/networkinsight.volt b/src/opnsense/mvc/app/views/OPNsense/Diagnostics/networkinsight.volt
index 01e55ed025..6b0ca17d8c 100644
--- a/src/opnsense/mvc/app/views/OPNsense/Diagnostics/networkinsight.volt
+++ b/src/opnsense/mvc/app/views/OPNsense/Diagnostics/networkinsight.volt

Patching file opnsense/mvc/app/views/OPNsense/Diagnostics/networkinsight.volt using Plan A... Hunk #1 succeeded at 180. done All patches have been applied successfully. Have a nice day. root@OPNsense:/var/netflow # `

Every fibre of my amateur being says this is an unwise thing to do but you have an opnsense badge against your profile.... Rebooting for good luck too.

jaxjexjox commented 2 years ago

Between my deletion of broken rows and your patch, my graphs are now behaving.

I would like to make a proposal to the developers (as a newbie - with not a lot of free time!) This was a fun experiment but I'd hate for someone else to do this

Proposal:

Reporting : -> Settings: "Repair Netflow Data"

This really should find instances where packets are 0 https://i.imgur.com/lz6RI18.png (this is a real mess data wise and should be pruned from the database )

As part of the cleaning / repair process, those columns should be deleted as 'bunk data' from the database.

P.S Just to make it clear, I really love the product and thank you all for the hard work, but the 'repair tool' spent 5 hours, on a quad core machine with only 2 months of data AND didn't identify rows with busted data :) I highly recommend it nuke dodgy data.

Thanks so much - I like my graphs being back!

AdSchellevis commented 2 years ago

Looks like nvd3 requires equal data arrays length for stacked graphs. If data length for some interface is shorter then nvd3 will throw type error. I would just add zeros in such cases.

@kulikov-a thanks for the fine analysis of the issue, this https://github.com/opnsense/core/commit/aff6657a3b2cb274b7dd663466eb3a0339a26885 would likely fix it.

opnsense-patch aff6657

should do the trick

jaxjexjox commented 2 years ago

I gotta be clear here though guys, I did delete the bad data, before applying the patch and might not have tested properly, so removing that bad data may have helped.

Please note those screenshots which captured clear, bad data in the DB - we should consider either not writing the bad data (somehow) or cleaning the database of dodgy data from time to time.

(and again, thank you all)

AdSchellevis commented 2 years ago

if we can't reproduce it for now, we probably best close the issue and reopen if it returns. If I understand @kulikov-a correctly it's not necessarily bad data, just missing (which could logically also happen when adding new interfaces at some point in time). Making sure the data is "symmetric" from the endpoint is likely the most practical fix.

kulikov-a commented 2 years ago

@AdSchellevis Hi! https://github.com/opnsense/core/commit/aff6657a3b2cb274b7dd663466eb3a0339a26885 works good! thanks! (tested with "bad" data - one of the ovpn interfaces has fewer data than the rest interfaces when choosing a 30 day range)

it's not necessarily bad data, just missing (which could logically also happen when adding new interfaces at some point in time).

exactly )

@jaxjexjox if you decide to test https://github.com/opnsense/core/commit/aff6657a3b2cb274b7dd663466eb3a0339a26885 dont forget to revert my patch (they do not conflict but overlap each other)

I did delete the bad data

frankly the data doesn't look bad imho. more like some kind of flaw when exporting sqlite data or opening a csv in excel (where can extra columns come from?) I'll be very surprised if Ad's patch doesn't work for your data.

jaxjexjox commented 2 years ago

@AdSchellevis Hi! aff6657 works good! thanks! (tested with "bad" data - one of the ovpn interfaces has fewer data than the rest interfaces when choosing a 30 day range)

it's not necessarily bad data, just missing (which could logically also happen when adding new interfaces at some point in time).

exactly )

@jaxjexjox if you decide to test aff6657 dont forget to revert my patch (they do not conflict but overlap each other)

I did delete the bad data

frankly the data doesn't look bad imho. more like some kind of flaw when exporting sqlite data or opening a csv in excel (where can extra columns come from?) I'll be very surprised if Ad's patch doesn't work for your data.

How do I revert your patch? I take it, it would be smart of me to do so, before the next release? (22.1.2?)

kulikov-a commented 2 years ago

How do I revert your patch?

run opnsense-patch -a kulikov-a 3df5b54 again (No reboot needed. just refresh the page)