OOM killed when using the nuclei SDK with the standard templates

projectdiscovery / nuclei

Nuclei is a fast, customizable vulnerability scanner powered by the global security community and built on a simple YAML-based DSL, enabling collaboration to tackle trending vulnerabilities on the internet. It helps you find vulnerabilities in your applications, APIs, networks, DNS, and cloud configurations.

https://docs.projectdiscovery.io/tools/nuclei

MIT License

20.6k stars 2.51k forks source link

OOM killed when using the nuclei SDK with the standard templates #4756

Closed stan-threatmate closed 8 months ago

stan-threatmate commented 9 months ago

Nuclei version:

3.1.8

Current Behavior:

I run the nuclei SDK as part of a binary that is deployed in a linux container (alpine) with memory limits of 8GB and 16GB. I use the standard templates. In both cases it gets OOM killed. Here are the settings I specify.

Rate Limit:             150 per second
Exclude Severity        []string{"info", "low"}
Template Concurrency    25
Host Concurrency        100
Scan Strategy           "template-spray"
Network Timeout         10
Network Retries         2
Disable Host Errors     true
Max Host Errors         15000
Probe Non Http Targets  0
Enable Code Templates   0
Stats                   true

I tried this with 115 and 380 hosts and both are having memory issues. What is causing the high memory utilization? I am saving the results from the nuclei scan in a list. Could the results be so large that they fill in the memory?

I run nuclei like this:

    ne, err := nuclei.NewNucleiEngine(opts...)
    if err != nil {
        return err
    }
    defer ne.Close()

    ne.LoadTargets(liveHosts, n.ProbeNonHttpTargets)
    err = ne.LoadAllTemplates()
    if err != nil {
        return err
    }

    var results []*NucleiResult
    err = ne.ExecuteWithCallback(func(event *output.ResultEvent) {
        // Convert output.ResultEvent into NucleiResult ...
        res := &NucleiResult{...}
        results = append(results, res)
    })

Expected Behavior:

The nuclei SDK should trivially handle scanning hosts with the above settings. It will be great to have an example of the SDK settings that match the default nuclei cli scan settings.

What would be the equivalent settings for the SDK?

nuclei -u example.com

Additionally what settings in the SDK control the memory utilization? It will be good to document those as well.

Steps To Reproduce:

Use the above settings and set up a scan. Watch it take a lot of memory over time. Better if you use 115 (or more) web sites.

Anything else:

AgoraSecurity commented 9 months ago

Have you tried reproducing with the latest version: https://github.com/projectdiscovery/nuclei/releases/tag/v3.1.10 ?

stan-threatmate commented 9 months ago

I looked at the change log but I don't see any memory improvements. I can give it a try. Does the SDK settings look good to you? Am I missing something obvious?

Also this is what it looks like in terms of memory utilization:

You can clearly see when it was killed. This is for an 8GB container.

tarunKoyalwar commented 9 months ago

@stan-threatmate , there was a minor change related to js pooling and upgrade in other pd dependencies so please try with latest version or even dev if required.

1) memory usage/consumption directly correlates with concurrency & other options . last time i ran on 1.2k targets with default concurrency ( i.e template concurrency 25 , host concurrency 25) . can you try running from sdk with this config ?

2) when there are more than 100 targets i would always recommend using host-spray scan strategy its efficient in many ways

3) can you include pprof(https://pkg.go.dev/net/http/pprof#hdr-Usage_examples) in your code and share profiles for inflection points ( ex: in above graph it would be a profile in (2-3PM) and seconds profile around (3:30PM) ) [ <- these are interesting/required profile locations of above graph but you would have to choose based on resource usage and manually dump this profile from cli using go tool pprof ]

stan-threatmate commented 9 months ago

I've been running it all day today with the latest version v3.1.10 but I see the same issues. Also added GOMEMLIMIT=500MiB and GOGC=20 but still ran out of memory even though the GC started to work pretty hard to clear it. I am about to instrument memory profiling and see if I can get some meaningful data.

Also your suggestions in the comment above contradict this document which I used to set the above options: https://github.com/projectdiscovery/nuclei-docs/blob/main/docs/nuclei/get-started.md

User should select Scan Strategy based on number of targets and Each strategy has its own pros & cons.

When targets < 1000 . template-spray should be used . this strategy is slightly faster than host-spray but uses more RAM and doesnot optimally reuse connections.

When targets > 1000 . host-spray should be used . this strategy uses less RAM than template-spray and reuses HTTP connections along with some minor improvements and these are crucial when mass scanning.

Concurrency & Bulk-Size

Whatever the scan-strategy is -concurrency and -bulk-size are crucial for tuning any type of scan. While tuning these parameters following points should be noted.

If scan-strategy is template-spray

-concurrency < bulk-size (Ex: -concurrency 10 -bulk-size 200)

If scan-strategy is host-spray

-concurrency > bulk-size (Ex: -concurrency 200 -bulk-size 10)

Can you please provide a recommendation on what settings effect the memory consumption the most and what settings effect the speed of execution? For example I've noticed the rate limit option doesn't really play much of a role in the SDK as reported by the stats which print the RPS. I assume the RPS is the request per second as defined by the rate limit?

I'll do some runs with your suggestion: 25 template and host concurrency. I wish there was a way to understand the system resource utilization based on the settings so we can plan for it based on the number of hosts.

stan-threatmate commented 9 months ago

Here is a pprof from a successful run on a smaller scale:

Showing nodes accounting for 421.32MB, 91.26% of 461.68MB total
Dropped 861 nodes (cum <= 2.31MB)
Showing top 50 nodes out of 159
      flat  flat%   sum%        cum   cum%
   64.17MB 13.90% 13.90%    64.17MB 13.90%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/common/generators.MergeMaps (inline)
   61.90MB 13.41% 27.31%   113.17MB 24.51%  fmt.Errorf
   50.85MB 11.01% 38.32%    51.71MB 11.20%  github.com/projectdiscovery/utils/errors.(*enrichedError).captureStack (inline)
   29.22MB  6.33% 44.65%    29.22MB  6.33%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).responseToDSLMap
   23.67MB  5.13% 49.78%    23.67MB  5.13%  runtime.malg
   19.24MB  4.17% 53.94%    78.68MB 17.04%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).generateRawRequest
   16.50MB  3.57% 57.52%    16.51MB  3.58%  reflect.New
   15.02MB  3.25% 60.77%    15.02MB  3.25%  github.com/projectdiscovery/utils/maps.(*OrderedMap[go.shape.string,go.shape.[]string]).Set (inline)
   13.11MB  2.84% 63.61%    13.11MB  2.84%  net/http.NewRequestWithContext
   12.07MB  2.61% 66.22%    13.08MB  2.83%  github.com/yl2chen/cidranger.newPrefixTree
      12MB  2.60% 68.83%    12.01MB  2.60%  github.com/syndtr/goleveldb/leveldb/memdb.New
   10.24MB  2.22% 71.04%    10.24MB  2.22%  gopkg.in/yaml%2ev2.(*parser).scalar
    8.12MB  1.76% 72.80%    30.34MB  6.57%  github.com/projectdiscovery/utils/url.ParseURL
       7MB  1.52% 74.32%     8.30MB  1.80%  github.com/projectdiscovery/utils/reader.NewReusableReadCloser
    6.71MB  1.45% 75.77%     6.71MB  1.45%  regexp/syntax.(*compiler).inst (inline)
    6.64MB  1.44% 77.21%     6.64MB  1.44%  strings.(*Builder).grow
    5.93MB  1.28% 78.50%     5.93MB  1.28%  bytes.growSlice
    5.30MB  1.15% 79.64%    29.26MB  6.34%  github.com/projectdiscovery/nuclei/v3/pkg/parsers.ParseTemplate
    5.18MB  1.12% 80.77%    43.54MB  9.43%  github.com/projectdiscovery/nuclei/v3/pkg/templates.parseTemplate
    4.91MB  1.06% 81.83%     4.91MB  1.06%  bytes.(*Buffer).String (inline)
    4.14MB   0.9% 82.73%     4.47MB  0.97%  github.com/ulule/deepcopier.getTagOptions
    3.67MB   0.8% 83.52%     3.67MB   0.8%  reflect.mapassign_faststr0
    3.39MB  0.74% 84.26%    10.92MB  2.37%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http/httpclientpool.wrappedGet
    3.38MB  0.73% 84.99%     7.92MB  1.72%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.generateVariables
    3.25MB   0.7% 85.69%     3.25MB   0.7%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.GenerateDNSVariables
    2.97MB  0.64% 86.34%     2.97MB  0.64%  github.com/projectdiscovery/retryablehttp-go.DefaultReusePooledTransport
    2.82MB  0.61% 86.95%     2.82MB  0.61%  github.com/projectdiscovery/ratelimit.(*Limiter).Take
    2.78MB   0.6% 87.55%     3.74MB  0.81%  github.com/projectdiscovery/nuclei/v3/pkg/model/types/stringslice.(*StringSlice).UnmarshalYAML

Note on the second line how much the fmt.Errorf takes. I expect a ton of errors as show by the stats:

[0:17:35] | Templates: 3891 | Hosts: 8 | RPS: 141 | Matched: 2 | Errors: 144915 | Requests: 149322/159840 (93%)

Also this stat is printed after nuclei has finished but it shows 93% and the stat never stops printing.

stan-threatmate commented 9 months ago

A profile of a more intense run:

top50
Showing nodes accounting for 1118.57MB, 92.07% of 1214.85MB total
Dropped 1016 nodes (cum <= 6.07MB)
Showing top 50 nodes out of 139
      flat  flat%   sum%        cum   cum%
  356.81MB 29.37% 29.37%   356.81MB 29.37%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/common/generators.MergeMaps (inline)
   95.77MB  7.88% 37.25%    95.77MB  7.88%  runtime.malg
   75.52MB  6.22% 43.47%   313.64MB 25.82%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).generateRawRequest
   68.30MB  5.62% 49.09%    68.38MB  5.63%  net/http.NewRequestWithContext
   58.39MB  4.81% 53.90%    58.39MB  4.81%  github.com/projectdiscovery/utils/maps.(*OrderedMap[go.shape.string,go.shape.[]string]).Set (inline)
   57.80MB  4.76% 58.66%    63.10MB  5.19%  github.com/ulule/deepcopier.getTagOptions
   42.65MB  3.51% 62.17%   135.82MB 11.18%  github.com/projectdiscovery/utils/url.ParseURL
   29.94MB  2.46% 64.63%    48.56MB  4.00%  fmt.Errorf
   28.74MB  2.37% 67.00%    28.74MB  2.37%  net/textproto.MIMEHeader.Set (inline)
   27.06MB  2.23% 69.22%    32.54MB  2.68%  github.com/projectdiscovery/utils/reader.NewReusableReadCloser
   19.45MB  1.60% 70.83%    19.45MB  1.60%  bytes.(*Buffer).String (inline)
   18.97MB  1.56% 72.39%    18.99MB  1.56%  strings.(*Builder).grow
   18.63MB  1.53% 73.92%    45.56MB  3.75%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.generateVariables
   17.99MB  1.48% 75.40%    18.06MB  1.49%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.GenerateDNSVariables
   17.13MB  1.41% 76.81%    17.15MB  1.41%  reflect.New
   17.02MB  1.40% 78.21%    21.78MB  1.79%  github.com/projectdiscovery/utils/errors.(*enrichedError).captureStack (inline)
   13.82MB  1.14% 79.35%    13.82MB  1.14%  github.com/projectdiscovery/ratelimit.(*Limiter).Take
   13.58MB  1.12% 80.47%    13.60MB  1.12%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).responseToDSLMap
   12.81MB  1.05% 81.52%    13.82MB  1.14%  github.com/yl2chen/cidranger.newPrefixTree
   12.65MB  1.04% 82.56%   116.63MB  9.60%  github.com/projectdiscovery/retryablehttp-go.NewRequestFromURLWithContext
      12MB  0.99% 83.55%    12.01MB  0.99%  github.com/syndtr/goleveldb/leveldb/memdb.New
   11.89MB  0.98% 84.53%    82.14MB  6.76%  github.com/projectdiscovery/utils/url.absoluteURLParser
   11.31MB  0.93% 85.46%    11.31MB  0.93%  github.com/projectdiscovery/utils/maps.NewOrderedMap[go.shape.string,go.shape.[]string] (inline)
   10.92MB   0.9% 86.36%    11.39MB  0.94%  github.com/projectdiscovery/utils/url.NewOrderedParams (inline)
   10.28MB  0.85% 87.21%    10.28MB  0.85%  gopkg.in/yaml%2ev2.(*parser).scalar
    7.35MB  0.61% 87.81%   731.95MB 60.25%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).Make
    7.33MB   0.6% 88.41%     7.33MB   0.6%  bytes.growSlice
    7.03MB  0.58% 88.99%     7.03MB  0.58%  regexp/syntax.(*compiler).inst (inline)
    5.40MB  0.44% 89.44%    29.48MB  2.43%  github.com/projectdiscovery/nuclei/v3/pkg/parsers.ParseTemplate
    5.31MB  0.44% 89.88%    55.44MB  4.56%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).generateHttpRequest
    5.08MB  0.42% 90.29%    43.94MB  3.62%  github.com/projectdiscovery/nuclei/v3/pkg/templates.parseTemplate
    4.61MB  0.38% 90.67%    30.72MB  2.53%  fmt.Sprintf
    3.31MB  0.27% 90.95%    11.10MB  0.91%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http/httpclientpool.wrappedGet
    2.81MB  0.23% 91.18%    21.36MB  1.76%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http/raw.readRawRequest
    2.77MB  0.23% 91.40%     6.43MB  0.53%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/common/replacer.Replace
    2.56MB  0.21% 91.62%    74.73MB  6.15%  github.com/projectdiscovery/utils/url.(*OrderedParams).Decode
    1.03MB 0.085% 91.70%    85.59MB  7.05%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).executeRequest
    0.87MB 0.072% 91.77%     8.20MB  0.67%  bytes.(*Buffer).grow
    0.67MB 0.055% 91.83%     9.21MB  0.76%  regexp.compile
    0.60MB 0.049% 91.88%   841.32MB 69.25%  github.com/projectdiscovery/nuclei/v3/pkg/tmplexec/generic.(*Generic).ExecuteWithResults
    0.58MB 0.047% 91.92%     7.34MB   0.6%  github.com/projectdiscovery/retryablehttp-go.NewClient
    0.51MB 0.042% 91.97%    72.52MB  5.97%  net/http.(*Transport).dialConn
    0.50MB 0.041% 92.01%    20.95MB  1.72%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).Compile
    0.29MB 0.024% 92.03%   841.64MB 69.28%  github.com/projectdiscovery/nuclei/v3/pkg/tmplexec.(*TemplateExecuter).Execute
    0.12MB  0.01% 92.04%    69.88MB  5.75%  github.com/projectdiscovery/fastdialer/fastdialer.(*Dialer).DialTLS
    0.11MB 0.0092% 92.05%    45.04MB  3.71%  github.com/projectdiscovery/nuclei/v3/pkg/templates.Parse
    0.09MB 0.0075% 92.06%    65.17MB  5.36%  github.com/projectdiscovery/fastdialer/fastdialer.AsZTLSConfig
    0.08MB 0.0068% 92.06%   809.25MB 66.61%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).executeParallelHTTP
    0.07MB 0.0055% 92.07%     6.47MB  0.53%  net/http.(*persistConn).writeLoop
    0.06MB 0.0051% 92.07%    69.73MB  5.74%  github.com/projectdiscovery/nuclei/v3/pkg/catalog/loader.(*Store).LoadTemplatesWithTags

stan-threatmate commented 9 months ago

And this uses what you suggested 25 template/host concurrency:

(pprof) top50
Showing nodes accounting for 1191.23MB, 92.63% of 1285.96MB total
Dropped 960 nodes (cum <= 6.43MB)
Showing top 50 nodes out of 136
      flat  flat%   sum%        cum   cum%
  220.54MB 17.15% 17.15%   403.46MB 31.37%  fmt.Errorf
  182.10MB 14.16% 31.31%   187.27MB 14.56%  github.com/projectdiscovery/utils/errors.(*enrichedError).captureStack (inline)
  177.04MB 13.77% 45.08%   177.04MB 13.77%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/common/generators.MergeMaps (inline)
  104.67MB  8.14% 53.22%   104.82MB  8.15%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).responseToDSLMap
   74.08MB  5.76% 58.98%    81.28MB  6.32%  github.com/ulule/deepcopier.getTagOptions
   70.54MB  5.49% 64.46%    70.54MB  5.49%  runtime.malg
   50.89MB  3.96% 68.42%   208.12MB 16.18%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).generateRawRequest
   40.33MB  3.14% 71.56%    40.33MB  3.14%  github.com/projectdiscovery/utils/maps.(*OrderedMap[go.shape.string,go.shape.[]string]).Set (inline)
   34.22MB  2.66% 74.22%    34.22MB  2.66%  net/http.NewRequestWithContext
   22.66MB  1.76% 75.98%    22.66MB  1.76%  bytes.growSlice
   21.43MB  1.67% 77.65%    80.65MB  6.27%  github.com/projectdiscovery/utils/url.ParseURL
   18.23MB  1.42% 79.06%    21.67MB  1.69%  github.com/projectdiscovery/utils/reader.NewReusableReadCloser
   17.32MB  1.35% 80.41%    17.32MB  1.35%  bytes.(*Buffer).String (inline)
   17.03MB  1.32% 81.74%    17.03MB  1.32%  reflect.New
   14.07MB  1.09% 82.83%    14.07MB  1.09%  strings.(*Builder).grow
   12.20MB  0.95% 83.78%    13.44MB  1.05%  github.com/yl2chen/cidranger.newPrefixTree
      12MB  0.93% 84.71%    12.02MB  0.93%  github.com/syndtr/goleveldb/leveldb/memdb.New
   10.28MB   0.8% 85.51%    10.28MB   0.8%  gopkg.in/yaml%2ev2.(*parser).scalar
    9.84MB  0.77% 86.28%   201.71MB 15.69%  fmt.Sprintf
    9.06MB   0.7% 86.98%    21.64MB  1.68%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.generateVariables
    9.02MB   0.7% 87.68%     9.02MB   0.7%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.GenerateDNSVariables
    7.67MB   0.6% 88.28%   558.04MB 43.39%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).executeRequest
    7.21MB  0.56% 88.84%     7.21MB  0.56%  strings.genSplit
    6.75MB  0.52% 89.36%     6.75MB  0.52%  github.com/projectdiscovery/ratelimit.(*Limiter).Take
    6.75MB  0.52% 89.89%     6.75MB  0.52%  regexp/syntax.(*compiler).inst (inline)
    5.39MB  0.42% 90.31%    63.35MB  4.93%  github.com/projectdiscovery/retryablehttp-go.NewRequestFromURLWithContext
    5.25MB  0.41% 90.72%    53.95MB  4.20%  github.com/projectdiscovery/utils/url.absoluteURLParser
    5.20MB   0.4% 91.12%    29.08MB  2.26%  github.com/projectdiscovery/nuclei/v3/pkg/parsers.ParseTemplate
    4.90MB  0.38% 91.50%       43MB  3.34%  github.com/projectdiscovery/nuclei/v3/pkg/templates.parseTemplate
    3.44MB  0.27% 91.77%   370.45MB 28.81%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).Make
    3.27MB  0.25% 92.02%     7.54MB  0.59%  net/http.(*Client).do.func2
    3.13MB  0.24% 92.27%    10.94MB  0.85%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http/httpclientpool.wrappedGet
    1.68MB  0.13% 92.40%    48.86MB  3.80%  github.com/projectdiscovery/utils/url.(*OrderedParams).Decode
    0.55MB 0.043% 92.44%     8.64MB  0.67%  regexp.compile
    0.54MB 0.042% 92.48%     7.26MB  0.56%  github.com/projectdiscovery/retryablehttp-go.NewClient
    0.52MB 0.041% 92.52%   100.17MB  7.79%  net/http.(*Transport).dialConn
    0.52MB  0.04% 92.56%    20.74MB  1.61%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).Compile
    0.19MB 0.015% 92.58%    22.85MB  1.78%  bytes.(*Buffer).grow
    0.13MB  0.01% 92.59%    83.98MB  6.53%  github.com/projectdiscovery/fastdialer/fastdialer.AsZTLSConfig
    0.12MB 0.0096% 92.60%    97.36MB  7.57%  github.com/projectdiscovery/fastdialer/fastdialer.(*Dialer).DialTLS
    0.08MB 0.0064% 92.60%   431.91MB 33.59%  github.com/projectdiscovery/nuclei/v3/pkg/tmplexec/generic.(*Generic).ExecuteWithResults
    0.08MB 0.0061% 92.61%    44.34MB  3.45%  github.com/projectdiscovery/nuclei/v3/pkg/templates.Parse
    0.06MB 0.0049% 92.62%    68.69MB  5.34%  github.com/projectdiscovery/nuclei/v3/pkg/catalog/loader.(*Store).LoadTemplatesWithTags
    0.05MB 0.0037% 92.62%   194.20MB 15.10%  github.com/projectdiscovery/utils/errors.(*enrichedError).Error
    0.04MB 0.0034% 92.62%    21.30MB  1.66%  net/http.(*persistConn).writeLoop
    0.04MB 0.0032% 92.63%    17.75MB  1.38%  gopkg.in/yaml%2ev2.(*decoder).sequence
    0.04MB 0.0027% 92.63%   431.95MB 33.59%  github.com/projectdiscovery/nuclei/v3/pkg/tmplexec.(*TemplateExecuter).Execute
    0.03MB 0.0021% 92.63%   196.37MB 15.27%  net/url.(*Error).Error
    0.02MB 0.0015% 92.63%   425.83MB 33.11%  github.com/projectdiscovery/retryablehttp-go.(*Client).Do
    0.01MB 0.00092% 92.63%    97.23MB  7.56%  github.com/projectdiscovery/fastdialer/fastdialer.(*Dialer).dial

stan-threatmate commented 9 months ago

Actually your proposal of having 25 concurrent hosts/templates worked on one of my test setups. I setup a constrained memory container with 2048MB RAM and aggressive GC settings: GOMEMLIMIT=500MiB and GOGC=20. I saw when the scan reached 35% the ram increased suddenly and the GC was trying really hard to free the memory. It got right to 2GB and stayed there for a bit. I thought it will be OOM killed but it managed to keep the pace of allocs/frees so it didn't get killed and it went down to a sustainable levels.

Then around the 75% complete it shot up again this time staying for a very long time at 2GB and the CPU really hurting at 1500% trying to free all this memory. It was successful and completed at the end.

My theory is that some templates are allocating a ton of memory and if the concurrency settings are above a certain threshold it can lead to the allocation rate surpassing the ability of the GC to free the memory which ultimately leads to an OOM kill. The only saving grace would be good amount of free memory and/or fast CPUs that can help the GC free up memory faster. But we really need guidance on the performance characteristics of nuclei: what is the RAM and CPU requirements for X many hosts and Y many templates etc.

Is there a way to know which templates use the most memory? Can we measure the CPU/memory of individual templates? That would be a very good metric to know. If I want to speed things up I'd like to be able to run efficient templates faster but slow down on the memory heavy ones in order to not run out of memory. Some of the big allocations are around the raw requests and responses.

stan-threatmate commented 8 months ago

Another update - using 25/25 for host/template concurrency when scanning 360 targets still resulted in OOM kill but it ran for a significantly longer time. I will set the garbage collection to aggressive values and try again: GOGC=20 GOMEMLIMIT=500MiB

tarunKoyalwar commented 8 months ago

@stan-threatmate , although it is helpful in production but tuning GC while debugging memory leaks might not help so i would recommending just to try out with normal options . because as you already know go does not immediately release memory but it gradually releases it in hope of reusing it instead of allocating it again and again .

that is why tuning GC aggresively would only cause more cpu utilization without any direct output (especially in this case )

Just now i have added more docs on how nuclei consumes resources and all the factors involved based on your suggestion at https://docs.projectdiscovery.io/tools/nuclei/mass-scanning-cli

From the profile details you have shared it looks like these are not the actual inflection points

Showing nodes accounting for 1191.23MB, 92.63% of 1285.96MB total <- heap memory is 1.2GB

Also looking at above profile data i can only tell that the top functions using heap as shown in above profiles are expected . generator.MergeMaps , generateRawRequest etc contain raw response data in maps and looking at concurrency i think this much is expected and since this data is actually obtained from targets being run its difficult to estimate how much data is currently being held .

If you think its related to particular set of templates i would suggest splitting templates and running different scan

with just http templates
with templates other than http protocol
with javascript protocol templates ^ you can use protocolType option in nuclei sdk to effectively filter out templates. If the problem is related to a specific set of template or a feature related to a specific template . it will be visible to you in one of the above scan results / observations

^ This is one of effective strategy i used which worked when i fixed memory leaks recently in v3.1.8-10

Other/Alternative strategy is to continiously capture snapshots and nuclei process memory (using memstats or manually via bash script using PID) . subtracting profiles between normal and sudden spike using -diff_base will pinpoint the function responsible for it

I will try to reproduce this using CLI with 800 targets.

Finally if you want to customize any execution behaviour or use your own logic . i would suggest taking a look at core package which contains how targets x templates are executed

tarunKoyalwar commented 8 months ago

btw nuclei-docs repo is deprecated and latest docs are available at https://docs.projectdiscovery.io/introduction

stan-threatmate commented 8 months ago

I'll continue debugging this but here is a run on a 16GB container with 8 CPUs which is scanning 358 hosts with 25/25 host/template concurrency and max host error set to 15000:

You can clearly see that there is some event that causes a runaway memory utilization at the end of the scan. At this point the nuclei stats showed 43% completion but I am not sure how trustworthy that percentage is.

You can see the GC working really hard throughout the scan to keep the memory low.

stan-threatmate commented 8 months ago

@tarunKoyalwar thank you for looking into this issue!

I have a question about -max-host-error. I want to use nuclei to try all templates on a host. If I understand this correctly we need to set the mhe to a large number in order to not stop the scan prematurely, right?

Also about the-response-size-readoption - do templates care about this value and if I set it 0 to save on memory would that hurt how templates work?

About the -rate-limit option - I haven't seen it make any difference at least according to the nuclei stats. Is the RPS metric reported by the stats controlled by this option?

stan-threatmate commented 8 months ago

Update: I am scanning 47 hosts with the following settings and I stil get OOM killed on a 16GB RAM 8 CPU container:

template/host concurrency 15
rate limit 10
exclude severity: info, low
scan strategy: host-spray
max host errors: 30
GOMEMLIMIT=500MiB GOGC=20

I suspect a template or a group of templates that spike up and cause large memory allocations because the memory allocations are stable until an inflection point where things spike.

The steep climb is what makes me believe it is a very specific template or related templates that cause this.

lebik commented 8 months ago

I have the same problem. And after revert to v. 2.9.15 everything works well. So I think that the problem is with one of 118 templates, which is not supported in v. 2.9.15

stan-threatmate commented 8 months ago

I can confirm that it has to be one of the critical templates. Here are two scans. The first is using only the critical severity templates. The second is using everything but the critical severity templates:

We can see when we don't do the critical severity templates the memory is minimal.

tarunKoyalwar commented 8 months ago

@stan-threatmate FYI , we were able to reproduce this some time ago and working on locating and fixing the vulnerable code

stan-threatmate commented 8 months ago

@tarunKoyalwar thank you!

Mzack9999 commented 8 months ago

The issue will be fixed as part of https://github.com/projectdiscovery/nuclei/issues/4800

tarunKoyalwar commented 8 months ago

@stan-threatmate , can you try running scan using sdk by disabling these 4 templates ? you can use -et and itscorresponding option in sdk to disable these templates

http/cves/2019/CVE-2019-17382.yaml
http/cves/2023/CVE-2023-24489.yaml
http/fuzzing/header-command-injection.yaml
http/fuzzing/wordpress-weak-credentials.yaml

stan-threatmate commented 8 months ago

@tarunKoyalwar I removed the templates you mention and my first test scan finished successfully. Memory looks great. Next I am running a large scan over 400 hosts but it will take 15h to complete so I'll report tomorrow. I also used more aggressive settings:

template concurrency: 50
host concurrency: 50
rate limit: 150
excluded severity: info, low

stan-threatmate commented 8 months ago

Removing the 5 templates allowed us to scan about 400 hosts with no problem on a 16GB container with 8 CPUs

Mzack9999 commented 8 months ago

@stan-threatmate The issue is about high parallelism in bruteforce templates, which causes a lot of buffer allocations to read http responses (up to default 10mb). To mitigate the issue a generic memory monitor mechanism, has been implemented in https://github.com/projectdiscovery/nuclei/pull/4833 (when the global RAM occupation is above 75% the parallelism is decreased to 5), I was able to complete multiple runs without the scan being killed on an 8gb system.

stan-threatmate commented 8 months ago

@Mzack9999 thank you! How is the RAM limit determined? Is it based on the free memory or the total memory? Can we configure the limits (75% and 5 threads) in the SDK?

Update: I looked at your changes and added some comments.

Second update: Another mechanism you can use is a rate limit on the memory allocations per second. If 10MB buffers can be allocated we can limit the buffer allocations per second to 50 for 500MB of RAM per second. Ideally this will be configurable.