minio / blake2b-simd

Fast hashing using pure Go implementation of BLAKE2b with SIMD instructions
Apache License 2.0
254 stars 31 forks source link

User interface issue, kind of... #25

Closed goarchit closed 6 years ago

goarchit commented 7 years ago

Really happy with the code and its performance. Have a user interface issue that I admit isn't really the codes problem, but hope I can receive some assistance.

The code I'm working on reads files into memory as a 32*32mb (1gb at a time) array. I need to be able to hash for a member of that array (string(block[i][:]) works) and the array as a whole (unknown how).

Because of the quantity of data, all related code works via pointers and does not require me to ever copy any portion of that array.

I REALLY REALLY do not want to build a 1GB string from the array, both for memory usage and computation reasons. Unfortunately string(block[:][:]) is not legal. Any suggestions for how to reference block[32][32mb] as an argument to io.WriteString(hash, ??)? Is there a different io function I should be calling?

Sincerely appreciate positive comments on this "out of the box" question.

harshavardhana commented 7 years ago

You shouldn't use memory directly you should stream this.. using a Reader Writer combination and use io.Copy()

fwessels commented 7 years ago

For inspiration you can look at https://github.com/s3git/bt2sum/

goarchit commented 7 years ago

Again, my appologies for my beginner understanding of Golang, but given that hash, err := blake2b.New(nil) returns a Writer, what is wrong with using io.WriteString(hash, somestring). The writer gets fed the same as if it was fed from an io.copy Reader does it not? I could, of course, do an io.Copy(hash,strings.NewReader(something)) if there is a reason to do so, but would still suffer from a lack of method to convert [32][32gb]byte into the required string([1gb]byte) parameter.

My target application is designed for multigigabyte files, for which I already have 1GB chucks in memory, stored as [32][32GB] byte array. Seems inefficient to have to read all that data a 2nd time, or to have to allocate another GB of memory to create a reformated copy (e.g. a [1gb]byte array)

I'm presuming the blake2b writer needs to receive all of its data at once, and everything I want to feed it is stored in the right order in memory in the [32][32mb] array.

The ancient C programmer in me would just grab the beginning address of the array and build the 3 word slice structure - but not aware of how to do that in GoLang that would satisify the compilers type-checking.

Perhaps the easiest, if possibley the ulgiest/riskiest long term, is to do an unsafe pointer conversion and just map the [32][32mb] array as a [1gb] array???

harshavardhana commented 7 years ago

@goarchit can you paste a snippet we can help you simplify it ..

goarchit commented 7 years ago

Just realized I likely have a bigger problem. I'm not sure 2D arrays are stored without some additional data, since they are arrays of arrays, not true multidimensional arrays. e.g. [x][y]byte, not [x,y]byte. Golang really doesn't have multidimensional arrays from what I've read.

I had hopes with this snippet:

    log.Console("block[0]=",string(block[0][0]))
    p := unsafe.Pointer(&block[0][0])
    str := *(*string)(p)
    log.Console("str=",str)

The first log entries prints out what was expected (sanity check), but the second dies with a panic: runtime error: growslice: cap out of range error

I've come across some web posts about 2D arrays having excessive overhead as well, so perhaps I'll be better off just utilizing a 1d 1GB array and slice as needed. That is well within my means.

Interested in any idea anyone might have, and wanted to thank you all for being kind for this out-of-the-box exchange.

ps: Link to 2D vs 1D array article: http://stackoverflow.com/questions/30154087/go-unexpected-performance-when-accessing-an-array-through-slice-of-slices-2d-s

kannappanr commented 6 years ago

Closing this issue as stale.