josephg / node-foundationdb

Modern Node.js FoundationDB bindings
Other
115 stars 17 forks source link

getRangeAllStartsWith & Tuple Key #32

Closed ballwood closed 5 years ago

ballwood commented 5 years ago

Apologies if this is a silly/simple question, I'm quite new to FoundationDb and testing it out with this nodejs library for speed. I'm using tuple keys like the example on npm and creating an index on teacher name.

const create = async (id, teacher) => {
  await db.doTransaction(async tn => {
    await tn.set(['class', id], {
      id,
      teacher
    });
    await tn.set(['class_teacher', teacher], id);
  });
};

I want to search this index like I would a startsWith query, so I figured using the following would work:

tn.getRangeAllStartsWith(['class_teacher', 'a']);

Wwhen I run this I always get empty results, even if something in the index starts with a.

However, if I use the following I get my results back:

tn.getRangeAll(['class_teacher', 'a'], ['class_teacher', 'b']);

Is there something I'm misunderstanding or is this a bug?

josephg commented 5 years ago

Btw in your code you don't need to use await when calling tn.set(). It shouldn't make any real difference, but it might affect benchmarking results a touch.

Hm, that should work. tn.getRangeAllStartsWith(['class_teacher', 'a']) does the equivalent of something like this: tn.getRangeAll(['class_teacher', 'a'], ['class_teacher', 'a', '\xff']). So, it should get all results which start with ['class_teacher', 'a']. Can you try tn.getRangeAll(['class_teacher', 'a'], ['class_teacher', 'a', '\xff']); and see if you get any results?

ballwood commented 5 years ago

@josephg sorry didn't mean speed of performance, I'm quite happy with that...just getting up and playing with it quickly so copied and pasted loads of stuff :)

Adding \xff as suggested returned no results :(

I did a bit more digging -using fdbcli I can get all the results back like so

`myapp.\x02class_teacher\x00\x02a\x00' is `"a67a183c-eddf-4939-8723-083dbfcc7d48"'
`myapp.\x02class_teacher\x00\x02aa\x00' is `"ba6f457c-ece4-436e-ac85-b70541dbc8de"'
`myapp.\x02class_teacher\x00\x02b\x00' is `"7c22c6d1-30a5-4412-a861-6f0202992477"'

When I run

getrange myapp.\x02class_teacher\x00\x02a

It returns

`myapp.\x02class_teacher\x00\x02a\x00' is `"a67a183c-eddf-4939-8723-083dbfcc7d48"'
`myapp.\x02class_teacher\x00\x02aa\x00' is `"ba6f457c-ece4-436e-ac85-b70541dbc8de"'

I think maybe when the tuple is encoded for the range search it is adding the \x00 on the end, it seems to be returning the same results

getrange myapp.\x02class_teacher\x00\x02a\x00
Range limited to 25 keys
`myapp.\x02class_teacher\x00\x02a\x00' is `"a67a183c-eddf-4939-8723-083dbfcc7d48"'

Could be worth doing something like a str.join does where it only adds a spacer character in the middle of the tuple items rather than marking the start and end, think thats what is causing the issue?

josephg commented 5 years ago

I hear what you’re saying but the implementation should match the equivalents in the other bindings. There might be a bug, or I may he misunderstanding the behaviour with tuples here. I’ll check when I have a chance - I would have expected this to work. Thanks for the extra debugging information.

josephg commented 5 years ago

So tn.getRangeAllStartsWith(['class_teacher', 'a']) should return the object with keys ['class_teacher', 'a'] and ['class_teacher', 'a', 'b'] but not ['class_teacher', 'aa'], since that doesn't share a prefix the way the tuple layer thinks about it.

Internally:

tn.getRangeAllStartsWith(['class_teacher', 'a'])

turns into

tn.getRangeAllStartsWith('\x02class_teacher\x00\x02a\x00')

which turns into

tn.getRange('\x02class_teacher\x00\x02a\x00', '\x02class_teacher\x00\x02a\x01')

As you pointed out above the equivalent in the CLI would be

getrange myapp.\x02class_teacher\x00\x02a\x00 <-- note the \x00 on the end marking the end of the tuple.

Does this make sense? Does this match the behaviour you're seeing through the node bindings? If so, thats behaving as intended and we should probably update the documentation. If you want to match any documents with ['class_teacher', 'a***'] then tn.getRangeAll(['class_teacher', 'a'], ['class_teacher', 'b']) is the right query to make. tn.getRangeStartsWith(['class_teacher', 'a']) will search for anything with the pattern ['class_teacher', 'a', *] instead.

ballwood commented 5 years ago

Yep makes sense. I had a deeper look at the docs today (the time series bit) makes a bit more sense now. Thanks so much for the help I really appreciate it 👍