Closed vzamanillo closed 4 years ago
First results after some rework, I have excluded github
because it takes a long time to finish, but it increases the consumption by only about 5MB and keeps it constant until finished.
Hey @vzamanillo we didn't focus on memory profiling in the past because subfinder is not something we run all the time, mostly it's one time run before you start with your target, but definitely one of the things to improve to make it more mature.
Apart from memory consumption improvement, do you also notice an improvement in overall run time (as we can see in the above poc), is it a result of linting work on your side or little improvement because of better memory management?
Hi @bauthard, there are no significant improvement in overall run time, there is in some cases, but the difference is not so important, in fact, in sources with large response data, such as commoncrawl
or waybackarchive
, it is a few milliseconds slower because the content of the responses is iterated line by line instead of putting everything in memory and processing the data.
These improvements in memory consumption are not in the branch of pull request #278, they are changes that I have made based on that branch, but I have them prepared to be able to merge them when #278 comes out (I think it is not time to introduce them in the #278 so as not to increase the cost of the review and because the scope of these changes is different from the changes we are talking about)
Step by step guide to profile golang
CPU / Memory.
Add profile
package to main.go
imports.
import (
"context"
"github.com/projectdiscovery/gologger"
"github.com/pkg/profile"
"github.com/projectdiscovery/subfinder/pkg/runner"
)
func main() {
defer profile.Start().Stop() // CPU profiling (default)
// defer profile.Start(profile.MemProfile).Stop() // Memory profiling
....
}
Runmain.go
:
# go run main.go -d uber.com -sources alienvault
after finished you can see the following message:
2020/07/27 13:46:24 profile: cpu profiling disabled, /tmp/profile978571390/cpu.pprof
Run pprof
and inspect the results (it will open a new browser window):
go tool pprof -http=:8080 /tmp/profile093511175/cpu.pprof
freecodecamp
pprof
guide: https://www.freecodecamp.org/news/how-i-investigated-memory-leaks-in-go-using-pprof-on-a-large-codebase-4bec4325e192/
While doing some memory profiles with
pprof
I've discovered that some sources increase the memory footprint of subfinder in excess ej:waybackarchive
This is because the size of the results is very large and we are using
ioutil.ReadAll(pagesResp.Body)
.After some changes to read the response stream using
bufio.NewReader(pagesResp.Body)
the memory consumption is drastically reduced.It happens in other sources too, especially in those that return
json
and no decoder is used to process it, but all the content is put in memory withioutil.ReadAll (pagesResp.Body)
andsubdomainExtractor
is used withregexp
to match subdomains (ej:threatminer
,threatcrowd
...).It would be nice to avoid using
ioutil.ReadAll (pagesResp.Body)
as long as possible and check the rest of the sources to use thejson
responses correctly.We could do it after merging #278 or we could introduce them directly in that branch.