moonglum / brazil

Experimental Ruby DSL for creating AQL queries
http://www.arangodb.org/
MIT License
9 stars 1 forks source link

Ideas for the API #8

Open moonglum opened 10 years ago

moonglum commented 10 years ago

@railsbros-dirk and me discussed this today face to face :wink:

We will try to mimic the API of Array & Enumerable as close as possible. Basically a collection in ArangoDB is a big array with a lot of hashes in it. Let's give it a try.

Return all documents of the characters collection:

result = Brazil::Collection.new('characters').to_a

A projection via map:

result = Brazil::Collection.new('casting').map do |cast|
  { name: cast.actor, rating: 'awesome' }
end.to_a

Filtering via find_all or select:

result = Brazil::Collection.new('characters')
  .find_all { |character| character.name == 'Sam Lawry' }
  .to_a

Sorting via sort_by:

result = Brazil::Collection.new('characters')
  .sort_by { |character| character.name }
  .to_a

Limit via first, drop and slice:

result = Brazil::Collection.new('characters').first(5).to_a

result = Brazil::Collection.new('characters').slice(5, 10).to_a
# and slice's well known alias:
result = Brazil::Collection.new('characters').slice[5, 10].to_a
# Also: Ranges
result = Brazil::Collection.new('characters').slice[5...10].to_a

Pagination via each_slice:

result = Brazil::Collection.new('characters').each_slice(5) do |some_characters|
  p some_characters
end

Joins via product (to join more collections, just add more to the product call):

result = Brazil::Collection.new('characters').product('casting').find_all do |character, casting| do
  character.name == casting.character
end.map do |character, casting|
  { character: character, casting: casting }
end.to_a
moonglum commented 10 years ago

I think the product should take other Brazil::Collections to be consistent with the way Array behaves. Even though it is more to type... Maybe we can offer a convenience function?

moonglum commented 10 years ago

Oh, and we are not sure yet if it will be character['name'] or character.name. This is up for discussion. I'm tending a little bit more towards character['name'] to keep it consistent with the "Array of Hashes" idea.

moonglum commented 10 years ago

Also interesting:

I could also imagine that if you call map on it, it will automatically return an array so you don't need to call .to_a on it.

moonglum commented 10 years ago

AQL has a bunch of functions and gives users the possibility to add their own. We should think about how we handle that.

moonglum commented 10 years ago

Enough for today. I will work on something different now.

moonglum commented 10 years ago

Also a thought about talking to the database. I don't want this gem to have its own configuration to talk to the database – and I don't really want it to be dependent on Ashikawa::Core. Still a feature like .to_a to evaluate it – or .map to return the results as an array would be awesome.

Idea – dependency injection :smile:

database = Ashikawa::Core::Database.new do |config|
  config.url = 'http://localhost:8529'
end

result = Brazil::Collection.new('characters', database.query).first(5).to_a

In Guacamole this part will be done by Guacamole. If you're outside of Guacamole, you have to provide some object that has a method execute that expects a String.

code-later commented 10 years ago

Ok, here come my thoughts:

First of all, after thinking about that approach some more and discussing it with @tisba I still like the idea ;-)

Joins

I'm not sure about the product method thou. As I understand the AQL documentation it is not really a product, because in most of the cases there will be a condition to reduce the resulting set (i.e. user.id == post.user_id). Of course this will be generated for the user, but the result will not have the Array#product semantics. At the same time join in Array is something completely different. I would suggest the following:

# Will be a CROSS JOIN equivalent
Brazil::Collection.new(:users).product(Brazil::Collection.new(:posts))

# Will be an INNER JOIN equivalent
Brazil::Collection.new(:users).product(Brazil::Collection.new(:posts)) { |u, p| u.id == p.user_id }

In Guacamole the latter would be the default with an API like this:

UsersCollection.product(:posts)

This brings us to the question what the product method on Brazil::Collection should accept as an argument. I think it's crucial to make a clear cut what will be the responsibilities of Brazil and what will be done in Guacamole. For me Brazil is much like Arel. It should provide a clean and consistent way to construct AQL strings. It should not know about the database nor about the associations in the domain. We have different tools for that. This said, there should not be a to_a method in Brazil but a to_aql method:

Brazil::Collection.new(:users).product(Brazil::Collection.new(:posts)) { |u, p| u.id == p.user_id }.to_aql

Have to generate an AQL string like this:

FOR u IN users
  RETURN MERGE (
    u,
    { "posts" : (
      FOR p IN posts
        FILTER p.user_id == u._key
        RETURN p
      )
    }
  )

At least that is something that could be used right away by Guacamole :wink: But I would also argue it is the least surprising result.

This needs to be passed to Ashikawa and will return an array of hashes. Calling to_a in Guacamole will perform all this and map the hashes to user models accordingly:

users = UsersCollection.product(:posts).to_a
# => [#<User:101 @posts=[#<Post:A>, …]>, #<User:202 @posts=[#<Post:A>, …]>]

Misc

Brazil::Collection.new(:users).count.to_aql
# => FOR u IN users RETURN LENGTH(users)

AQL functions

Ok, I think I said something to every point :wink:

code-later commented 10 years ago

Mmh, I played with that count example. It seems this is not how it is done in AQL. Actually I couldn't find any other solution than using the count provided by the cursor. But only if the count option was set. From the perspective of Brazil this seems quite unfortunate. Is it really the way to go? If it is, should we always set the count option? Or does this imply any unwanted side effects?

moonglum commented 10 years ago

In parallel I'm discussing this idea with Martin via email. He generally likes the idea, but he disagrees with the notion of an 'Array of Hashes' as the collections in ArangoDB are not sorted (they can be sorted, but they are not sorted). He suggests a Hash, because we are mapping IDs to values. Not sure I like it – I'm tending towards "An enumerable of Hashes" where the enumerable is called Collection as in the examples above.

Joins

Interesting food for thought :+1: I'm not sure if we agree on this, but the default behavior of ArangoDB is a cross join (if you do not use a separate construct called filter) – this is also the case for the product method of Ruby's Array. So the following statements are equal for two collections a (containing 'a', 'b') and b (containing 1, 2):

Right? And with filter I will only keep those in the results that meet certain requirements I specified. So I would let product work exactly like the product in Array resulting in the AQL described above. The user can then chain a filter call on it.

We could add a second method join that works similar to what you described (I would not call it product, because it works different from product). But I think this should not be part of Brazil, but be a part of Guacamole (because Guacamole is aware of which other models are referenced). In Guacamole it would work like this (Articles references Authors):

Articles.join(:authors)

This would then join the author onto every article. This could be done with the following AQL expression:

FOR a IN articles 
  FOR b IN authors
    FILTER a.author_id == b.id
RETURN [a, b]

What do you think?

Talking to the Database I like the idea of not talking to the database at all from Brazil. I agree that map should not return an array, you're totally right.

So .to_a would be a method that Guacamole adds to Brazil? I don't really like that. So I suggest the following:

Why? Two reasons:

  1. Guacamole doesn't need to monkey patch Brazil to add the .to_a functionality.
  2. If you use Brazil without Guacamole, it is less akward IMHO.

Misc

AQL functions

code-later commented 10 years ago

Regarding the 'Array of Hashes' analogy: I understand the point that it doesn't matches the internal implementation of ArangoDB, but for the user this is what it looks like. And AQL will return this two. I know you guys really love to be 100% correct when it comes to nomenclature and I really love you for this. But in this case I would be a lit less correct in favor of a better ease-of-use ;-)

Joins

I think we're on the same track here. I very well understood, that a JOIN in AQL is a cross product per default. Since there is no explicit JOIN clause in AQL but instead realized with nested FOR loops it is obvious that the JOIN condition from SQL is just another FILTER in AQL. I'm sorry I was not clear enough on this before.

What I don't like about this, the condition for the Join operation is not easily visible. And argue you will have a condition most of the time. From an application developer perspective it just doesn't make sense to get me a cross product of all the data ;-) Passing a block to the product method would make this connection very clear in Ruby land. And I think this is a good thing.

Furthermore I don't like the idea of returning an array of two elements like you proposed. The transformation back into Guacamole models is quite cumbersome this way and for me it is not what I would do with a system like ArangoDB. Why not just returning the final document? Of course this could be changed with a custom RETURN statement.

What do you think?

Talking to the Database

Monkey Patching?

ce7yxlh

I never said something about monkey patching Brazil. That would be just insane ;-) Let me instead introduce you to a good friend of mine. I get your point that using Brazil without Guacamole would be awkward, but in that case just let us add some functionality to Ashikawa::Core. I would not add any functionality to Brazil that is related to talking in any way to the database. Don't get me wrong, I don't think it's bad design passing a query object to Brazil, but in that case Brazil needs to know how to interact with that object. We have three different libraries and Brazil should just create AQL query strings and nothing more.

What do you think?

Misc

AQL functions

tisba commented 10 years ago

"We have three different libraries and Brazil should just create AQL query strings and nothing more."

I think it might have to do more then "just query strings". Otherwise I don't really see how you can generated parameterized queries (or however they are called in Arango). Since bind parameters are not part of the query (the sole reason why they prevent nasty AQL injections), you have to handle that somehow too.

But maybe I'm overlooking something :) Interesting discussion otherwise, I'm looking forward to where this leads :)

code-later commented 10 years ago

Yeah, you're right. With "just AQL query string" I really wanted to say "The AQL query string with those bind parameters". This could look something like this:

Brazil::Collection.new(:user).find { |u| u[:name] == "David Bowman" }

This could generate something like this:

{
  query: "FOR u IN @@users_collection FILTER u.name == @name RETURN u",
  bind_parameters: { users_collection: 'users', name: 'David Bowman' }
}

This can be passed to ArangoDB via Ashikawa. Thanks for the clarification.

moonglum commented 10 years ago

Array of Hashes

Thank you for your kind words :wink: But there's also another reason why we should not call it an Array: There will be parts of the API of the Array that we won't implement, and there will be parts that we will add to it that are not part of the Array. I suggest to just call it a Collection and say in the description that is behaves similar to an Array. What do you think?

Joins

I think that product should behave as I described as this is close to the way it works in AQL and also in the product method of Array. What exactly are you proposing? Given I have blog posts (having a title and a author_id) and a and I have authors (having a name). Should it result in the following hash:

{
    title: 'The Big Lebowski',
    author: {
        name: 'The Coen Brothers'
    }
}

Puh. I don't know. I think that is the task of the map call. What if I only want to use certain information of the author? Or put them directly into the hash and not as a 'sub hash'? Then I have to fiddle around a lot – compared to just having two hashes. That makes me think that we should probably really split the join from the product. The join is what you describe and merges those two together using a provided key – it is really, really useful in a lot of cases. But if you want to get fiddly and build new things than you can use product which will result in two hashes.

Talking to the database

You and your Proxies :laughing: Ok, that's good. I think that would be fine for Guacamole, but I don't get how it solves the problem of using it standalone. You say that we should add some functionality into Ashikawa::Core for that – but that would mean that Ashikawa::Core gets Brazil as a dependency. Or what exactly do you mean?

I don't want Brazil to know about the database at all. It gets an object and all it expects is that it has a method execute that takes an AQL String and a hash of bind parameters and .to_a returns whatever this method returns. I think that's a quite small and really reasonable interface that it expects. Guacamole could for example pass in something that already does the conversion to the right model.

Tisba's Remark

Good point! Yep, totally agree. to_aql should return a tupel (either as a hash or as an array, I don't care) of the query string with the @s and @@s and the bind parameters. to_a takes this into consideration as described above.

More

code-later commented 10 years ago

Array of Hashes

Get your point, but in this case let's call it just a Collection of Documents, which provide an API similar to Array and Hash :wink: In the end they are both not really an Array or Hash but just a way to build a nice API.

Joins

Ok, what about this: Within Guacamole you get a join method which will give you this:

UsersCollection.joins(:posts)

And this will construct the following Brazil statement:

Brazil::Collection.new(:users).product(Brazil::Collection.new(:posts)) { |u, p| p[:user_key] == u[:key] }.map { |user, post| [user, post] }

With this Guacamole must still hold a reference to the underlying Cursor to iterate over the posts for each user. This reference must be passed as the association proxy.

In some cases you will want eager loading (aka: includes):

UsersCollection.includes(:posts)

The Brazil will eventually look something like this:

Brazil::Collection.new(:users).map do |user, post|
  user.merge({
    posts: Brazil::Collection.new(:posts).find do |p|
      p[:user_key] == user[:key]
    end
  })
end

I still like the idea of providing filters with impact on the product as a block to the product method. But I will not make a drama if we don't do this ;-)

After all we should really focus on one thing here. For me this would be ease of use from Guacamole. I don't care about how messy nifty queries could be become without the help of Guacamole as long as we hit the 80/20 target with Guacamole itself.

Talking to the database

I don't understand why you want to put that feature in Brazil. If someone wants to use this without Guacamole that's easily possible:

aql_tuple = Brazil::Collection.new(:users).find { |user| user[:name] == "David Bowman" }.to_aql
cursor = ashikawa_connection.execute aql_tuple.aql_string, { bind_vars: aql_tuple.bind_parameters }
cursor.each

That's not a lot of boilerplate to add. Adding this to Brazil isn't that much effort either, but will still add functionality and tests which need to be maintained. I just don't want to do this, because as I said before: Focus should be on Guacamole.

moonglum commented 10 years ago

Ok, I think this is ready for a new spike :wink: Looking forward to it.

code-later commented 10 years ago

giphy

moonglum commented 10 years ago

:laughing: