tierpod / dmarc-report-converter

Convert dmarc reports from xml to human-readable formats
MIT License
237 stars 25 forks source link

Add concurrent DNS lookup support #48

Closed moorereason closed 5 months ago

moorereason commented 6 months ago

Add a limited worker pool to do concurrent DNS lookups. Add a lookup_limit config parameter to control pool size (default=50).

I'm using github.com/sourcegraph/conc to simplify the goroutine management. Since conc uses generics, I updated the minimum Go version in go.mod to 1.18.

Updates #10

tierpod commented 5 months ago

Hi @moorereason!

Thank you for PR, this feature will significantly improve execution time for cases with many ip addresses in reports. I haven't heard about conc before, looks great.

I only have 1 concern. You moved DNS lookup function from package pkg/dmarc/dmarc.go to command line tool cmd/dmart-report-converter/files.go. I would prefer to keep it in first file, because it could be useful to have such feature builtin into package. It will require adding argument lookupLimit to functions signatures that have lookupAddr argument ( Report.Parse, Report.ReadParse*), but it's OK.

Could you please do this? I completely understand if you don't have time for this. In that case, let me now, and I'll try to do this using your code as reference.

moorereason commented 5 months ago

@tierpod, I've refactored this code and pushed up another commit.

Two reasons why I didn't do it this way to begin with:

  1. I didn't want to break the dmarc package API, but that doesn't seem to be a concern. 😁
  2. By having the concurrency in the dmarc package, we end up dealing with a single report at a time instead of gathering all reports and running them all through a lookup pool at once.

Either way, this is much faster than the existing code.

I can squash this PR into a single commit before you merge if you want.

tierpod commented 5 months ago

Amazing! You did a great job, appreciate it.