phaag / nfsen

Legacy NfSen code
Other
23 stars 9 forks source link

use of '-c' did limit search, not output #13

Closed dgeo closed 1 year ago

dgeo commented 1 year ago

"Limit to X Flows" option in detail view does limit search to 20 flows by default, and doesn't allow searching more than 10000 flows, which is not much. I suppose here the original intention was to limit output, not searching (which is selected elsewhere by date/time)

phaag commented 1 year ago

Thanks for the pull request.

thezoggy commented 1 year ago

from what i understand, -c limits the number of flows to process while -n only shows x results of the flows processed.

so for "list flows" wouldnt you want to use -c since you want it to just stop getting flows once it hits the number reached. while for "Stat TopN" you would want -n there since it needs to process all flows, and then only show you the # of results asked.

i would think making this change would result in 'list flows' to have unneeded overhead to get all the flows to just show the x flows. when before it would just abort out once that number is reached and just show the results.

dgeo commented 1 year ago

I understand, but here we have many thousands of flows by hour, tha proposed limit (max 10k from memory) doesn't allow searching more than 15mn, without knowing where the search did stop. We use the 'list flows' page to search exhaustive connections to or from criterias, in a defined window of time: the -c limit is very quick but totally innaccurate. On our hardware (a not-so-powerfull server) the -c limit makes no real change between 10 and 10000: it's very quick but miss most matches ! We limit the search by the time window: a more than 3 hours search is slow, but we feel it better than inaccurate results ;) my 2cts

thezoggy commented 1 year ago

right but you are fundamentally changing what the command is doing by changing to use the other option?

say you you set to use 24 hour timeframe, and within that timeframe there are 20 million flows. the system returning the first 1000 flows is a bit different than saying get all flows then return the top 1000.

phaag commented 1 year ago

@thezoggy - you are basically right. Option reverted.

dgeo commented 1 year ago

ok, so nfsen doesn't allow you to search in more than 10000 flows with an nfdump filter, and by default will show you no match in the time window you selected because it searched in the first bytes of nfdump captures (the first 10 flows !)... I'll keep this patch here, and I think this behaviour was not intended in the first place. Please just try to search a connection to a specific IP in more than 5mn time: you won't find it through nfsen interface unless you try nfdump command without -c (or the ip appears in the first 10/100/1000/10000 flows of the defined time window). there is no way to search without limit within nfsen.

thezoggy commented 1 year ago

Anytime I've needed to grab large amounts of flows I've just done it via CLI where I can do things like output to csv format and not have the limits of nfsend/webserver restrict to how much it can do before it times out/upper limits set to restrict it going awry.

phaag commented 1 year ago

Let me explain this: First you select the times lot in 5min intervals, then you list the flows in that interval. If 10000 is not enough to investigate you could easily add more option in details.php line 23 $ListOptions - so you can add 1000000 if you like. If this is useful or not is up to you and how your workflow is.

dgeo commented 1 year ago

I understand this. My main workflow is to react to alerts defined by a nfdump filter, and I usually use a large "time window" in the interface and search for all matching flows in the days before (sometime weeks).

I know I can use the command-line (and I do this too), but this is not the case for all my coleagues...

It used to work this way out of the box until some years ago, until some change occured and I had to apply this patch to get correct response (sorry I did not report at this time, by lack of time, so I can't remember when this behaviour changed for me).

If I find time I'll try to find tho change that broke this some years ago, I though this change just reverted to the old behaviour working for us.

If you apply a filter in just one 5mn dump, I understand this limit for big environments.

Here we have less than 1Tb of dumps for a year, maybe this helps searching in greater time windows.

Anyway, thank you Peter for maintaining this great tool !

Geoffroy.

Le 20 mai 2023 07:31:21 UTC, Peter Haag @.***> a écrit :

Let me explain this: First you select the times lot in 5min intervals, then you list the flows in that interval. If 10000 is not enough to investigate you could easily add more option in details.php line 23 $ListOptions - so you can add 1000000 if you like. If this is useful or not is up to you and how your workflow is.

-- geoffroy desvernay (mobile)