moov-io / ach

ACH implements a reader, writer, and validator for Automated Clearing House (ACH) files. The HTTP server is available in a Docker image and the Go package is available.
https://moov-io.github.io/ach/
Apache License 2.0
447 stars 150 forks source link

Reader cannot consume large, single-line ACH files #1381

Closed campbellr closed 4 months ago

campbellr commented 4 months ago

ACH Version

master

What were you trying to do?

We sometimes receive large ACH files where:

  1. the individual 94 byte records are not separated by newlines
  2. the entire file exceeds the bufio.MaxTokenSize

    ach.Reader runs into problems parsing the file because it uses a Scanner that splits on newlines, and has a 64k maximum buffer/token size.

Because ach.Reader.Read doesn't actually check the result of Scanner.Scan() (or Scanner.Err()), it attempts to parse an arbitrary 64k chunk of data as an ACH record (or series of records) and fails.

What did you expect to see?

ach.Reader should be able to consume arbitrarily large ACH files, with or without newline-delimited records (at least until we run out of memory).

I think using a custom SplitFunc (that breaks 94 byte records and/or newlines -- to handle "broken" lines that aren't padded to 94 characters) for the Scanner instead of the default ScanLines would be more reliable.

What did you see?

ach.Reader fails to properly parse the file with some cryptic message about

How can we reproduce the problem?

Attempt to read an ACH file that:

  1. Is not newline-delimited
  2. Exceeds 64k bytes (bufio.MaxScanTokenSize)
adamdecaf commented 4 months ago

I've opened https://github.com/moov-io/ach/pull/1388 to fix this bug. Can you try out the branch on your file and see if parsing works?

campbellr commented 4 months ago

Yes, that MR seems to do the trick, thanks for the quick fix!

adamdecaf commented 4 months ago

Fix released in https://github.com/moov-io/ach/releases/tag/v1.35.1