moov-io / ach

ACH implements a reader, writer, and validator for Automated Clearing House (ACH) files. The HTTP server is available in a Docker image and the Go package is available.
https://moov-io.github.io/ach/
Apache License 2.0
452 stars 150 forks source link

feat: add a File iterator #1304

Closed adamdecaf closed 10 months ago

adamdecaf commented 10 months ago

This iterator will help us process files entry-by-entry which should have a large reduction of memory usage.

adamdecaf commented 10 months ago

cc @wadearnold

adamdecaf commented 10 months ago

Including #1305 and this fix together gives a noticeable improvement over reading files fully in memory.

The Basic_ tests are calling ach.ReadFile(..) and the Iterator tests are calling NextEntry() until they run out of entries.

goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
Benchmark
Benchmark/Basic_TRANSACTIONS100223143314.TXT
Benchmark/Basic_TRANSACTIONS100223143314.TXT-16                  100     154561285 ns/op    13297808 B/op     559303 allocs/op
Benchmark/Basic_TRANSACTIONS101023162208.TXT
Benchmark/Basic_TRANSACTIONS101023162208.TXT-16                  134     132010532 ns/op     9418912 B/op     482160 allocs/op
Benchmark/Iterator_TRANSACTIONS100223143314.TXT
Benchmark/Iterator_TRANSACTIONS100223143314.TXT-16               145     122932698 ns/op    14206015 B/op     448108 allocs/op
Benchmark/Iterator_TRANSACTIONS101023162208.TXT
Benchmark/Iterator_TRANSACTIONS101023162208.TXT-16               250      71073532 ns/op     8219774 B/op     259784 allocs/op
adamdecaf commented 10 months ago

The iterator does find EntryDetail records that are outside of a batch, so that's a benefit over ReadFile. I do have a patch for the iterator skipping blank lines and need to test it when blank lines are in the middle. That'll be easy tomorrow.

Patch ``` @@ -84,6 +84,10 @@ func (i *Iterator) NextEntry() (*BatchHeader, *EntryDetail, error) { i.scanner.Scan() line = i.scanner.Text() i.reader.lineNum++ + if line == "" { + // TODO(adam): Can the reader and iterator handle newlines in the middle of a file? + return nil, nil, nil + } } if err := i.reader.readLine(line); err != nil { @@ -102,6 +106,7 @@ func (i *Iterator) NextEntry() (*BatchHeader, *EntryDetail, error) { for { if i.scanner.Scan() { foundLine := i.scanner.Text() + i.reader.lineNum++ if foundLine == "" { break } ```
codecov-commenter commented 10 months ago

Codecov Report

Merging #1304 (7a3e13f) into master (c35e195) will decrease coverage by 0.08%. The diff coverage is 83.18%.

:exclamation: Current head 7a3e13f differs from pull request most recent head c869459. Consider uploading reports for the commit c869459 to get more accurate results

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1304 +/- ## ========================================== - Coverage 88.48% 88.41% -0.08% ========================================== Files 73 74 +1 Lines 7089 7170 +81 ========================================== + Hits 6273 6339 +66 - Misses 480 492 +12 - Partials 336 339 +3 ```