goraft / raft

UNMAINTAINED: A Go implementation of the Raft distributed consensus protocol.
MIT License
2.43k stars 480 forks source link

server loads old snapshots #235

Open bcwaldon opened 10 years ago

bcwaldon commented 10 years ago

I have the following snapshot files:

1007_79443860.ss
101073_272975458.ss
103852_275330228.ss
11166_165394546.ss
1136_85946380.ss
12466_171438525.ss
1336_94077876.ss
22734_195564716.ss
3111_127284971.ss
3276_130008173.ss
3440_132650609.ss
579_44459558.ss
59456_237259277.ss
651_50350829.ss
68876_245043430.ss
8128_151960757.ss
88589_262075601.ss
9787_159118266.ss

On startup, the server picks the "latest" snapshot as a starting point. The "latest" snapshot is determined by first sorting all the snapshot files in alphabetical order (as they are above), then choosing the last one [0] . The format of these file names is <lastTerm>_<lastIndex>.ss. The intention here is to grab the snapshot from the highest term, falling back to the highest index in case two snapshots were created within the same term. Looking at the sort order above, the last one is clearly not the one with the highest term or highest index.

[0] https://github.com/goraft/raft/blob/510993e76b2444b66f2092eba7c30580e7426040/server.go#L1339

philips commented 10 years ago

This is a somewhat dangerous bug if people have been using rsync or something to backup their snapshot directory. But, in the common case there is only ever one snapshot file on disk and this is managed by go-raft.

otoolep commented 10 years ago

Wow. Something as simple as this should fix it.

https://github.com/otoolep/raft/commit/ffe3da06a7a2a55c809724510392733b3f10a859

bcwaldon commented 10 years ago

@otoolep You actually need to pad both fields, as you need to be able to sort by the last index in the event that the term appears more than once.

bcwaldon commented 10 years ago

...and we need to deal with snapshots that are already out there.

otoolep commented 10 years ago

Check the PR -- it pads both fields.

bcwaldon commented 10 years ago

Ah, I didn't realize there was a PR.

https://github.com/goraft/raft/pull/237