ronomon / deduplication

Fast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.
MIT License
72 stars 9 forks source link

Would a Promise-base example be useful? #3

Closed nlfiedler closed 5 years ago

nlfiedler commented 5 years ago

The demo is all well and good, but for an application, something based on Promises seems more suitable. I offer this bit of code that is working for me:

function findFileChunks (infile, average) {
  const fd = fs.openSync(infile, 'r')
  const minimum = Math.round(average / 2)
  const maximum = average * 2
  const source = Buffer.alloc(maximum * 2)
  const target = Buffer.alloc(dedupe.targetSize(minimum, source.length))

  return new Promise((resolve, reject) => {
    let flags = 0
    const close = (error) => {
      fs.closeSync(fd)
      if (error) {
        // force the loop to exit
        flags = 1
        reject(error)
      }
    }

    let chunks = []
    let fileOffset = 0
    let chunkOffset = 0
    let sourceStart = 0

    while (flags === 0) {
      const length = source.length - sourceStart
      const bytesRead = fs.readSync(fd, source, sourceStart, length, fileOffset)
      fileOffset += bytesRead
      flags = (bytesRead < length) ? 1 : 0
      const sourceSize = sourceStart + bytesRead
      try {
        dedupe.deduplicate(average, minimum, maximum, source, 0, sourceSize, target, 0, flags,
          (error, sourceOffset, targetOffset) => {
            // n.b. the library throws the error, so this is always undefined
            if (error) {
              close(error)
              return
            }
            let offset = 0
            while (offset < targetOffset) {
              const hash = target.slice(offset, offset + 32)
              offset += 32
              const size = target.readUInt32BE(offset)
              offset += 4
              chunks.push({ hash, offset: chunkOffset, size })
              chunkOffset += size
            }
            // Anything remaining in the source buffer should be moved to the
            // beginning of the source buffer, and become the sourceStart for the
            // next read so that we do not read data we have already read:
            sourceStart = sourceSize - sourceOffset
            if (sourceStart > 0) {
              source.copy(source, 0, sourceOffset, sourceOffset + sourceStart)
            }
            if (flags !== 0) {
              // the last block has finished processing
              close()
              resolve(chunks)
            }
          }
        )
      } catch (err) {
        close(err)
      }
    }
  })
}

For demo purposes, this would need a .then() and some console logging to match the current demo. Let me know if this is useful and I can create a proper pull request. Or not, and feel free to close this.

jorangreef commented 5 years ago

Thanks for the suggestion @nlfiedler but I would prefer the module to stick with classic vanilla Javascript.

I've never been a fan of Promises. I don't find the style as clean as pure callback error handling. Promises also add runtime overhead.