my8bird / node-leveldb

NodeJS bindings to levelDB - a fast and lightweight key/value database library
http://code.google.com/p/leveldb/
BSD 2-Clause "Simplified" License
63 stars 12 forks source link

random empty keys when iterating over a database #47

Closed pconstr closed 12 years ago

pconstr commented 12 years ago

With the latest 0.7.1 from npm I'm seing random empty keys when iterating over a database created by an earlier version of node-leveldb on node.js 0.6.x

I think it's unlikely that the database is corrupted, as it is successfully read by the earlier node-leveldb version and by a C++ program using leveldb directly.

The c++ program iterates over 269298 records and doesn't get any empty keys. Same for leveldb 0.5.5 on node.js 0.6.11, with somewhat different code

leveldb 0.7.1 on node 0.8.8 gets 269298 records as well, but between 2 and 7 of them have a empty key. The exact number depends on the run and on what I do while iterating.

It is perfectly possible I'm doing something really stupid so below's the code, unfortunately I can't share the db. With this code - which just iterates - I get 4 empty keys all the time and always in the same positions: 1661, 6138, 22158, 114237 but that changes just by moving console.log statements around

#!/usr/bin/env node

'use strict'

var leveldb = require('leveldb');

leveldb.open('./encodingDict', {create_if_missing:true}, function(err, db) {
  if(err) {
    return cb(err);
  }

  readEncodingDict(db, function(err) {
    if(err) {
      return cb(err);
    }
    console.log('read OK');
  });
});

var encodingDict = {};

var count = 0;
function readEncodingDict(db, cb) {
  db.iterator(function(err, it) {
    if(err) {
      return cb(err);
    }

    it.first(function(err) {
      if(err) {
        return cb(err);
      }

      function goGet() {
        if(!it.valid()) {
          console.log('read', count, 'records');
                return cb(null);
        }

        it.key(function(err, key) {
          if(err) {
            return cb(err);
          }

          ++count;

          if(key === '') {
            console.log('empty key on position', count);
          }

          it.next(function(err) {
            if(err) {
              return cb(err);
            }
            process.nextTick(goGet);
          });
        });
      }

      goGet();
    });
  });
}
justmoon commented 12 years ago

Can you put the following inside of the if (key == '') condition:

console.log('---' + it._key.toString('ascii') + '---' + it._key.toString('hex') + '---');

Basically I want to find out if the problem affects only the callback parameters or the values that are stored on the iterator object as well.

pconstr commented 12 years ago
---------
---------
---------
---------
---------
---------

it's all empty

justmoon commented 12 years ago

Ok, this is looking like an issue that I've run into a couple of time in other modules. When you create a SlowBuffer in C++ and pass it back to JavaScript, sometimes it incorrectly reports length zero. If somebody has the actual explanation of why this happens, please do let me know. In the meantime the way I've fixed it in other modules is to wrap the SlowBuffer in an actual Buffer.

Can you give this branch a try and see if it fixes the problem?

https://github.com/justmoon/node-leveldb/tree/correctfastbuf

pconstr commented 12 years ago

Just tried https://github.com/justmoon/node-leveldb/tree/correctfastbuf - the problem goes away, thanks.

Isn't SlowBuffer for internal use? Backing for Buffer that we're not supposed to expose to Javascript programs? http://nodejs.org/api/buffer.html#buffer_class_slowbuffer

justmoon commented 12 years ago

Just tried https://github.com/justmoon/node-leveldb/tree/correctfastbuf - the problem goes away, thanks.

Ok, pushed the fix to master. Closing issue. If somebody has a way to still return a SlowBuffer (which may be faster, although I don't know if it's enough to matter or even be measurable), please open a pull req.

Isn't SlowBuffer for internal use? Backing for Buffer that we're not supposed to expose to Javascript programs?

Yes and no. The random number generator in the official API also returns a SlowBuffer. And SlowBuffer are now funcationally more or less equivalent. The performance improvement in the name only applies to the allocation, so I do believe it would be more correct to return SlowBuffers from C++ modules, if it wasn't for this bug.