Closed adamdecaf closed 1 month ago
Hello @adamdecaf,
Just got to know about moov and its OSS community. I must say this is great and beautiful. I am golang engineer and recently looking into perfomance engineering. I will love to help take a look at issue and investigate and give a report on how it can be carried out if given the opportunity to.
Looking forward to hearing from you.
Thank you Peter.
Sounds good! Let me know if you need anything or have questions. Do you have a place to start?
@adamdecaf
Yes, I do. Will try to familiarize myself with the codebase and investigate properly. If i do have questions, i will relate them to you.
Thank you!
@adamdecaf
- Is it faster to progressively merge files (from disk/io.Reader,
*ach.File
instances, or another input?)
I have some questions regarding this statement. Do you mean I should verify which is more efficient between merging files directly from the disk as they are being read and merging the files after they have been read and converted to *ach.File
instance before merging.
There are a few options available. We currently need the files in memory (as an []*File
) which is inefficient because each file has to be read from disk (or created in memory) before calling MergeFiles
. There's a use-case in achgateway where we are reading files from disk and merging them, so a function to merge them faster in that path would be desired.
There may be faster ways to merge []*File
as well, which would be an improvement. We need to keep the existing MergeFiles
function for backwards compatibility.
I've linked a few PR's in the description which add benchmarks or previous performance improvements.
Yea! Thanks for that
Hello @adamdecaf
Not sure I was able to pull this off. But i will love to learn more about this codebase in general and probably pick up smaller tasks so i can get used to it because I could not really understand some parts of the code itself.
The only recommedation i have got is why we are using []*file and not []file because the latter has more memory perks than the former.
Thank you. If there are any smaller tasks i could take on that could get me used to the codebase, I will be really happy to pick it up.
Why would []File
have better memory perks than []*File
? Passing around an array of pointers is lighter-weight than passing around the structs.
I think that it is a general rule of thumb to try as much as possible to reduce the use of pointers in a program because it thereby reduces the amount of heap allocations done. It is not always so actually but it is so common in golang programs
-- compiler generates a lot of checks when using pointer which makes programs a bit slower -- pointers often have poor locality of reference -- it gives the garbage collector more work
Just some opinions regarding that
We'd need to compare benchmarks between MergeFiles with and without pointers. Typically pointers to larger structs are more efficient and faster. See https://medium.com/@meeusdylan/when-to-use-pointers-in-go-44c15fe04eac
Yes
Hm, Looking at it now, trying generate becnchmark reports for both cases. Quite a deep change in to the codebase. Not sure if we are ready to risk that at this point or i can go ahead.
You can make whatever changes on a branch to compare pointers vs no pointers.
Oh okay perfect
- Is it faster to progressively merge files (from disk/io.Reader,
*ach.File
instances, or another input?)
@adamdecaf
By the way just to note here that it has this very approach proven that this is will improve the memory performance but reduce the cpu time significantly which i believe is not what we desire.
here some resources I found regarding that
One possible mitigation would be to move the vectors to disk, but loading them from disk at search time can add [significant latency](https://colin-scott.github.io/personal_website/research/interactive_latency.html), especially on slow disks. Instead, we chose to compress our vectors with int8 quantization.
]We have a big usecase in achgateway
to merge files on disk. We are currently reading files in groups to merge which has much better memory pressure, but takes more cpu time. If you can create a better method of merging files on disk that would be very helpful!
Lots of fixes were released in https://github.com/moov-io/ach/releases/tag/v1.37.0
The Moov ACH library has a few ways of merging ACH files together, which can be useful to minimize the number of files sent over the network and cost from vendors, Fed, or other providers.
The performance and utility of MergeFiles can be improved and this issue sets out to discuss a few options.
MergeFiles(files []*ach.File) ([]*ach.File, error)
*ach.File
instances, or another input?)Previous issues: https://github.com/moov-io/ach/pull/1041, https://github.com/moov-io/ach/pull/1276, https://github.com/moov-io/ach/pull/1282