typicode / lowdb

Simple and fast JSON database
MIT License
21.32k stars 918 forks source link

data loss if process hangs/crashes on same event tick #27

Closed kurttheviking closed 9 years ago

kurttheviking commented 9 years ago

this may be related to https://github.com/typicode/lowdb/issues/23

in short, we've seen various data loss conditions occur which seem to be related to the async nature of the writes. the easiest way to simulate this is:

var low = require('lowdb');
var db = low('db.json');
var collection = db('kaiju');
collection.push({name: 'godzilla'});

...wait a moment...

collection.push({name: 'mothra'}); process.exit();

the content of db.json would then be:

{
  "kaiju": [
    {
      "name": "godzilla"
    }
  ]
}

I like this project because it's a quick hack for a lot of use cases so I am curious what the recommended approach is when a crash/hang is detected. Is there a synchronous save() call which can be attempted if needed?

typicode commented 9 years ago

Hi @kurttheviking ,

I must say you choose good examples :+1: (I like Godzilla movies too :). In short, there's no synchronous save() right now.

Just curious what's the nature of the crash/hang? Is it a SIGINT, SIGTERM, ... unhandledException, ...? Is it the program that calls process.exit() by itself?

Because if you can catch it, you can write something like:

var low = require('lowdb');
var db = low('db.json');
var collection = db('kaiju');

function run() {
  collection.push({name: 'godzilla'});
  collection.push({name: 'mothra'});
  JSON.parse('') // will crash
}

try {
  run()
} catch(e) {
  console.error(e)
}

It's not perfect though but all the data should be written.

For the short story, at first LowDB was synchronous and I thought making it async would be better. But in the long run, it's quite complicated to make it right in every aspects.

I'm planning to publish a version with synchronous support and enhance the asynchronous version.

I'm also very curious regarding your use-case/project as it would help me understand better the way people use LowDB. Is asynchronous very important to you or synchronous would be ok?

josselinchevalay commented 9 years ago

hi, I think this issue wants to highlight is that if given the partielement are written (asynchronous cf) programe it stop.

For my part I think is a good thing asynchronous:

but in some cases have the opportunity to:

collection.save () or collection.persit () would be a plus.

in fact I think the idea behind it is to allow the choice to start the backup process flat file

Then I just cheated me

kurttheviking commented 9 years ago

@typicode ha, seems like we're both fans :+1:

In some cases it seems to be an uncaught exception but in the particular example i was running into something else running sync was blocking the event loop and crashing out (i have since made that component async as well)

Generally, the async case works best for the things we use it for. To simplify one of our use cases, we have split tests conducted on a per-server basis but didn't have a proper db to use for persistance. so, lowdb captures the results as they occur and we pull that file down occasionally for analysis.

I ended up forking the repo and playing around with constructing the db with low.sync but that's not really ideal because every write is made synchronous. I think a better implementation would keep .save as usual but also provide .saveSync (or something similar) which writes the changes synchronously for the rare cases in which that is desirable. as @josselinchevalay said above, it's really about choice for the particular use case.

if you'd like, i can work on a PR that provides that functionality for this project. up to you how you want to handle it.

typicode commented 9 years ago

Just to let you know, I've worked on it also yesterday so the code have changed a little. But the idea is the same, you either start LowDB in sync or async mode.

Going back to the subject, I don't see how sync and async could be combined.

Let's imagine LowDB is started asynchronously and it crashes

> db state has changed to { a: 1 }
starts writing { a: 1 } to file.json
> db state has changed to { b: 2 }
ends writing { a: 1 } to file.json
starts writing { b: 2 } to file.json
> db state has changed to { c: 3 }
CRASH!
synchronously write { c: 3 } to file.json

I think it will end up as a race condition and file.json will contain { b: 2 } or { c: 3 }.

The only way you could save data synchronously and be sure it won't be overwritten by the asynchronous writing would be to write to another file.

But honestly, if you have some other ideas or approach, I'd really like to hear about them.

kurttheviking commented 9 years ago

I've never worked on a database project so I don't have any experience with how this class of problems is "typically" solved. Instead, I'll approach this purely from the application perspective.

Synchronous writes, by definition, block the event loop; thus the state cannot change while the write is in progress. At the same time, if I recall correctly, even sync writes (via the fs module) are not strictly guaranteed beneath the node application. So, I agree, the db state is necessarily unknown in the case you mentioned. Perhaps the default async write callback (or post-Promise resolution, I'm not familiar with how you currently handle that part) could execute some type of state reconciliation.

In any event, I think providing a syncSave would allow the consuming application some flexibility when deciding how to manage the write (e.g. calling syncSave(); process.exit(); thereby halting the event loop manually).

typicode commented 9 years ago

Me neither.

However from my experience with LowDB, when I tried to make "clever" things with async, it ended up with unexpected issues. So I don't really want to go to something like state reconciliation.

But, I was thinking about something else that would give you more flexibility. By starting LowDB in memory, you get fast operations and you can control the way you want data to be written.

var db = low()
db.object = data

db('post').push({ title: 'foo' }) // won't be persisted automatically

// ...

fs.writeFileSync('db.json', JSON.stringify(db.object))
process.exit()

Maybe I'll make it more convenient though like you suggested.

Does the code above solve your issue?

kurttheviking commented 9 years ago

@typicode yeah that's a fair point; and yes, providing a convenience wrapper over the fs.writeFileSync as described would indeed do the trick

typicode commented 9 years ago

As suggested, I've added db.saveSync. I think you were right, in the end it will give more freedom.

Now you can do things like:

var db = low()
db.save(filename) // or db.saveSync(filename)

// Which is the same as

var db = low(filename, { autosave: false })
db.save() // or db.saveSync()

// You can also save to another file

db.save('copy.json')

It's available in 0.6.0

kurttheviking commented 9 years ago

great thanks!