typicode / steno

Super fast async file writer with atomic write âš¡
MIT License
678 stars 40 forks source link

Are benchmarks realistic? #21

Closed codenomnom closed 3 years ago

codenomnom commented 3 years ago

Hi there 👋 First of all - thanks for all your great work, I appreciate it!

I just wanted to ask something I can't wrap my head around. It seems steno is way way faster than regular fs. But then I just saw it is internally using the fs itself, and it got me wondering - well how is that possible?

And I'm actually thinking the benchmark is comparing apples to oranges. From benchmark.ts:

// To avoid race condition issues, we need to wait
// between write when using fs only
for (let i = 0; i < 1000; i++) {
  await writeFile(fsFile, `${data}${i}`)
}
console.timeEnd(fsLabel)

console.time(stenoLabel)
// Steno can be run in parallel
await Promise.all(
  [...Array(1000).keys()].map((_, i) => steno.write(`${data}${i}`)),
)
console.timeEnd(stenoLabel)

What I see here is that the first one actually does write into a file a thousand times. It waits for each write to finish and then runs the next one.

The second benchmark runs all writes in parallel (as stated). That's the first issue that makes me think comparison is not proper. But the second one is even more important and it comes from the lock implementation (index.ts):

async write(data: string): Promise<void> {
  return this.#locked ? this.#add(data) : this.#write(data)
}

The way this locking mechanism is implemented is that it would "save" (#add) data for later usage, if a write is currently ongoing. When the first write is done, it will check if there's any next data to be written to the file, and if so - makes the call:

if (this.#nextData !== null) {
  const nextData = this.#nextData
  this.#nextData = null
  await this.write(nextData)
}

Since #nextData is a single property that gets overridden (as stated in the code), you have just one next data to write, and you make a second file write call. And that's the end of the chain of writes.

What I'm saying is that from what I can see, you compare a thousand writes to a file, to just two writes to a file. Which is of course way way slower 😃

In no way I'm pointing fingers here, I'm just trying to figure out if I've missed something, or it's just a wrong statement.

Thanks again, good luck!

typicode commented 3 years ago

Hi @codenomnom,

No worries. Good points and questions :) I'll try to explain the reasoning.

For the history, steno was created for JSON Server. One of the main use case is probably writing to the same file in a server context.

With fs only, you can have two basic approaches with drawbacks:

  1. Use fs.WriteFile which is async. So multiple POST requests at the same time won't block the server but there can be race condition and data in file can be uncertain.
  2. Use fs.WriteFileSync. Multiple POST requests are blocking but data in file is guaranteed to correspond to the requests received in the end.

Option 2 is the best one in terms of data.

With steno, I wanted to be able to write asynchronously while guaranteeing that data in file is correct. So there's this lock trick which skips writes when steno is busy and new data arrives.

The benchmark tries to represent this case. So it writes 1000 times to the same file with both approaches and the file should have the correct data in the end.

Maybe there's a better benchmark, but it's the closest I've found to the use case.

As described in README, steno should provide improvement if writing to the same file often and/or concurrently.

But I agree, it's not faster than fs.writeFile(Sync) at all if you just write every now and then :)