ssbc / jitdb

A database on top of a log with automatic index generation and maintenance
50 stars 7 forks source link

"or" operator behavior changes at top level #138

Closed Barbarrosa closed 3 years ago

Barbarrosa commented 3 years ago

I was looking over the different operators when I noticed a repeated pattern that seems confusing. Take the example from the readme:

query(
  fromDB(db),
  and(slowEqual('value.content.type', 'contact')),
  and(or(slowEqual('value.author', aliceId), slowEqual('value.author', bobId))), // this line
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

I see the or operator almost universally wrapped in an and operator. When I dug in, I found that the or part applied not only to the or method's parameters but also to prior operations when at the top level. The code below functionally differs from the above, i.e. including non-contact objects with matching author values:

query(
  fromDB(db),
  and(slowEqual('value.content.type', 'contact')),
  or(slowEqual('value.author', aliceId), slowEqual('value.author', bobId)),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

The below example is functionally identical to the first example, and it shows the behavior does not persist below the top level of the query.

query(
  fromDB(db),
  and(
    slowEqual('value.content.type', 'contact'),
    or(
      slowEqual('value.author', aliceId),
      slowEqual('value.author', bobId)
    )
  ),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

Technically it's a BC break, but it seems like the or operator shouldn't apply itself to prior operations at the top level.

staltz commented 3 years ago

Yeah, you're right. I think it would be best to recommend using only one "and-or" tree as an argument, i.e. recommend the 3rd example, not the 1st nor the 2nd. It's easier to understand and predict how it'll behave.

Barbarrosa commented 3 years ago

I was thinking about doing it automatically - i.e. something like #139. That kind of change may be an issue depending on adoption since it's technically a BC break, but such usages are likely incorrect anyways.

staltz commented 3 years ago

@Barbarrosa I'm thinking about this one still, I wouldn't want to merge something that changes semantics, then revert it, so I'd like to be very sure about this one before merging.

Now that I take a second look, I think the 2nd code example is actually functionally correct and we'd like to support it. Technically, we should always recommend the 3rd example (I think the readme examples need to be updated), but there's yet another use case to consider: some queries can be split into reusable parts, e.g.:

const contacts = query(
  fromDB(db),
  and(slowEqual('value.content.type', 'contact'))
)

query(
  or(
    contacts,
    slowEqual('value.author', aliceId),
    slowEqual('value.author', bobId)
  ),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

The top-level and for singular queries always bothered me (e.g. query(fromDB(db), and(slowEqual(.......)), toCallback(cb))), and I think something can be done about it. Either we alter all the equal,slowEqual,gt,lt etc operators to automatically chain from the previous argument, or we can introduce a new operator just that is meant for the singular case, e.g. query(fromDB(db), just(slowEqual(.......)), toCallback(cb)). This is common in stream libraries like RxJS or Ramda, and it would be beneficial for us too. I think I prefer just over the former option. My intuition is that we should keep boolean operators like and and or as close as possible to the clean boolean semantics as we can.

Overall I'd appreciate feedback from more folks on what the semantics for these should be. cc @arj03 @mixmix

mixmix commented 3 years ago

Honestly example 2 is very unclear to me about what i would expect it to return 1 and 3 seem very clear to me.

I think I prefer these because I like clear brackets around logical operations to make order of operation explicit. Code like a && b || c drives me wild

Is that an ok level of detail @staltz ?

mixmix commented 3 years ago

I like 1 the most, it feels like sql : where a and b and c

With 3 it makes it look like the only way you should use and is in the middle of the source and sink and I would then expect to go

And( A, B, C )

IE am allowed to put any number of things in there. But then I wild not know how to compose from fragments unless I'm nesting instead of chaining fragments

staltz commented 3 years ago

Good amount of feedback!

arj03 commented 3 years ago

I'm not a super fan of the just operator. That seems like the simplest case, so I don't think we should introduce syntax just for that. This also means that the following should work:

query(
  fromDB(db),
  slowEqual('value.content.type', 'contact'),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

But also:

query(
  fromDB(db),
  slowEqual('value.content.type', 'contact'),
  or(slowEqual('value.author', aliceId), slowEqual('value.author', bobId)),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

I still find this much easier to read:

query(
  fromDB(db),
  and(
    slowEqual('value.content.type', 'contact'),
    or(
      slowEqual('value.author', aliceId),
      slowEqual('value.author', bobId)
    )
  ),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

I wouldn't object if it was an error not to include and in the middle case.

mixmix commented 3 years ago

I really don't like this case:

​query(
  fromDB(db),
  slowEqual('value.content.type', 'contact'),
  or(slowEqual('value.author', aliceId), slowEqual('value.author', bobId)),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

Because my mind goes: Source where slowEqual Or (a, b) ... Wait is that or on slowEqual or a / b?

I'm a big fan of clear and verbose. Maybe support the other stuff under the hood but make the way arj said he likes default documentation?

arj03 commented 3 years ago

One thing running through my head is that and was sort of a where in that it signals that now you filter the query. So another option would be have have that explicit:

query(
  fromDB(db),
  where(slowEqual('value.content.type', 'contact')),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

and

query(
  fromDB(db),
  where(
    or(
      slowEqual('value.author', aliceId), 
      slowEqual('value.author', bobId)
    )
  ),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
  })
)

That way we could enforce that where can only have 1 parameter.

Just wanted to throw that option out there if we are brainstorming breaking the api :-)

staltz commented 3 years ago

I like it!

Barbarrosa commented 3 years ago

I like the where option since it helps clarify how the query is organized.

mixmix commented 3 years ago

Yes and, can we still compose with where?

Eg

const myFriends = query(
  fromDB(db), 
  where(....find people I follow or something)
)

query(
   myFriends,
   where(...just posts in last day), 
   toCallback(done)
)

I know it doesn't work like this. I'm just throwing something on the table to practice thinking about composing, which feels like a useful thing to prototype to see if this new pattern works nicely

staltz commented 3 years ago

Yes and, can we still compose with where?

I would aim for supporting that, yes. My latest thoughts is to make this change:

Currently:

Proposed:

The semantics of where can be "take the previous argument and pick a subset of it" (notice the boolean "and" is implicit).

mixmix commented 3 years ago

I like the vibe but need more clarity (I felt like a session on hackmd would be way easier)

If it's a chain of where, I reckon "filter" could be worth considering instead

I feel like we're somewhere between metaphors of a stream (sink, filter, source) and a hmmm...

staltz commented 3 years ago

Mixmix and I had a quick video call. Main feedback I got is that in the docs we should differentiate two things:

arj03 commented 3 years ago

Yes, I like that way of looking at things.

Barbarrosa commented 3 years ago

@staltz @arj03 I like where this discussion is going. Do you need to work this change into a bigger picture before moving forward (e.g. a deprecation period, JITDB/SSB roadmap, getting stakeholder buy-in, addressing/investigating related issues and doc concerns)?

staltz commented 3 years ago

I just wanna say I have implemented where on my local branch and tests are passing for both jitdb and ssb-db2! Will submit PR in the coming days

Barbarrosa commented 3 years ago

@staltz @arj03 I appreciate how responsive you've been to the issues & PRs brought up here.