Open moonglum opened 10 years ago
I think the product should take other Brazil::Collection
s to be consistent with the way Array behaves. Even though it is more to type... Maybe we can offer a convenience function?
Oh, and we are not sure yet if it will be character['name']
or character.name
. This is up for discussion. I'm tending a little bit more towards character['name']
to keep it consistent with the "Array of Hashes" idea.
Also interesting:
desc
vs. asc
)I could also imagine that if you call map
on it, it will automatically return an array so you don't need to call .to_a
on it.
AQL has a bunch of functions and gives users the possibility to add their own. We should think about how we handle that.
IS_NULL
or IS_BOOLEAN
. I would offer them as methods on the attributes (user['name'].nil?
and user['name'].boolean?
LOWER
, SUBSTRING
etc.)... wow. I would probably do the same here. This includes LIKE. So we can do that as follows: user['name'].like? 'Dirk%'
NEAR
and WITHIN
functions, the same functionality is offered via simple queries.FULLTEXT
function to perform fulltext queries.Enough for today. I will work on something different now.
Also a thought about talking to the database. I don't want this gem to have its own configuration to talk to the database – and I don't really want it to be dependent on Ashikawa::Core. Still a feature like .to_a
to evaluate it – or .map
to return the results as an array would be awesome.
Idea – dependency injection :smile:
database = Ashikawa::Core::Database.new do |config|
config.url = 'http://localhost:8529'
end
result = Brazil::Collection.new('characters', database.query).first(5).to_a
In Guacamole this part will be done by Guacamole. If you're outside of Guacamole, you have to provide some object that has a method execute
that expects a String.
Ok, here come my thoughts:
First of all, after thinking about that approach some more and discussing it with @tisba I still like the idea ;-)
Joins
I'm not sure about the product
method thou. As I understand the AQL documentation it is not really a product, because in most of the cases there will be a condition to reduce the resulting set (i.e. user.id == post.user_id
). Of course this will be generated for the user, but the result will not have the Array#product
semantics. At the same time join
in Array
is something completely different. I would suggest the following:
# Will be a CROSS JOIN equivalent
Brazil::Collection.new(:users).product(Brazil::Collection.new(:posts))
# Will be an INNER JOIN equivalent
Brazil::Collection.new(:users).product(Brazil::Collection.new(:posts)) { |u, p| u.id == p.user_id }
In Guacamole the latter would be the default with an API like this:
UsersCollection.product(:posts)
This brings us to the question what the product
method on Brazil::Collection
should accept as an argument. I think it's crucial to make a clear cut what will be the responsibilities of Brazil and what will be done in Guacamole. For me Brazil is much like Arel. It should provide a clean and consistent way to construct AQL strings. It should not know about the database nor about the associations in the domain. We have different tools for that. This said, there should not be a to_a
method in Brazil but a to_aql
method:
Brazil::Collection.new(:users).product(Brazil::Collection.new(:posts)) { |u, p| u.id == p.user_id }.to_aql
Have to generate an AQL string like this:
FOR u IN users
RETURN MERGE (
u,
{ "posts" : (
FOR p IN posts
FILTER p.user_id == u._key
RETURN p
)
}
)
At least that is something that could be used right away by Guacamole :wink: But I would also argue it is the least surprising result.
This needs to be passed to Ashikawa and will return an array of hashes. Calling to_a
in Guacamole will perform all this and map the hashes to user models accordingly:
users = UsersCollection.product(:posts).to_a
# => [#<User:101 @posts=[#<Post:A>, …]>, #<User:202 @posts=[#<Post:A>, …]>]
Misc
character[:name]
over character.name
, but would use Symbols.map
cannot return an Array in Brazil, because of the afore mentioned reasons. It could in Guacamole, but that would prevent us from chaining those statements.reverse
sound interesting…count
should just generate the appropriate AQL String:Brazil::Collection.new(:users).count.to_aql
# => FOR u IN users RETURN LENGTH(users)
AQL functions
IS_NULL
. Using nil?
and present?
seem to be legit.LIKE
for now. Most of the other stuff should be done in Ruby land before giving it to the database or you should write the query all by yourself.Ok, I think I said something to every point :wink:
Mmh, I played with that count example. It seems this is not how it is done in AQL. Actually I couldn't find any other solution than using the count
provided by the cursor. But only if the count option was set. From the perspective of Brazil this seems quite unfortunate. Is it really the way to go? If it is, should we always set the count option? Or does this imply any unwanted side effects?
In parallel I'm discussing this idea with Martin via email. He generally likes the idea, but he disagrees with the notion of an 'Array of Hashes' as the collections in ArangoDB are not sorted (they can be sorted, but they are not sorted). He suggests a Hash, because we are mapping IDs to values. Not sure I like it – I'm tending towards "An enumerable of Hashes" where the enumerable is called Collection as in the examples above.
Joins
Interesting food for thought :+1: I'm not sure if we agree on this, but the default behavior of ArangoDB is a cross join (if you do not use a separate construct called filter
) – this is also the case for the product
method of Ruby's Array. So the following statements are equal for two collections a (containing 'a', 'b') and b (containing 1, 2):
Collection(:a).join(:b).to_a
FOR x IN a FOR y IN b RETURN [x, y]
['a', 'b'].product([1, 2])
[['a', 1], ['a', 2], ['b', 1], ['b', 2]]
Right? And with filter
I will only keep those in the results that meet certain requirements I specified. So I would let product
work exactly like the product in Array resulting in the AQL described above. The user can then chain a filter
call on it.
We could add a second method join
that works similar to what you described (I would not call it product, because it works different from product). But I think this should not be part of Brazil, but be a part of Guacamole (because Guacamole is aware of which other models are referenced
). In Guacamole it would work like this (Articles references
Authors):
Articles.join(:authors)
This would then join the author onto every article. This could be done with the following AQL expression:
FOR a IN articles
FOR b IN authors
FILTER a.author_id == b.id
RETURN [a, b]
What do you think?
Talking to the Database
I like the idea of not talking to the database at all from Brazil. I agree that map
should not return an array, you're totally right.
So .to_a
would be a method that Guacamole adds to Brazil? I don't really like that. So I suggest the following:
Brazil::Collection
with Brazil::Collection.new(:collection_name)
you won't have a .to_a
method, but you will have a .to_aql
method to get the AQL query from it.Brazil::Collection
with Brazil::Collection.new(:collection_name, query_object)
you will have both a .to_aql
method that works as above and additionally a .to_a
method that will generate the query and then execute it using the query_object, returning an Array.Why? Two reasons:
.to_a
functionality.Misc
reverse
: Interesting bad or interesting good?count
: Agreed.AQL functions
IS_NULL
and LIKE
.Regarding the 'Array of Hashes' analogy: I understand the point that it doesn't matches the internal implementation of ArangoDB, but for the user this is what it looks like. And AQL will return this two. I know you guys really love to be 100% correct when it comes to nomenclature and I really love you for this. But in this case I would be a lit less correct in favor of a better ease-of-use ;-)
Joins
I think we're on the same track here. I very well understood, that a JOIN in AQL is a cross product per default. Since there is no explicit JOIN
clause in AQL but instead realized with nested FOR
loops it is obvious that the JOIN
condition from SQL is just another FILTER
in AQL. I'm sorry I was not clear enough on this before.
What I don't like about this, the condition for the Join operation is not easily visible. And argue you will have a condition most of the time. From an application developer perspective it just doesn't make sense to get me a cross product of all the data ;-) Passing a block to the product
method would make this connection very clear in Ruby land. And I think this is a good thing.
Furthermore I don't like the idea of returning an array of two elements like you proposed. The transformation back into Guacamole models is quite cumbersome this way and for me it is not what I would do with a system like ArangoDB. Why not just returning the final document? Of course this could be changed with a custom RETURN
statement.
What do you think?
Talking to the Database
Monkey Patching?
I never said something about monkey patching Brazil. That would be just insane ;-) Let me instead introduce you to a good friend of mine. I get your point that using Brazil without Guacamole would be awkward, but in that case just let us add some functionality to Ashikawa::Core. I would not add any functionality to Brazil that is related to talking in any way to the database. Don't get me wrong, I don't think it's bad design passing a query object to Brazil, but in that case Brazil needs to know how to interact with that object. We have three different libraries and Brazil should just create AQL query strings and nothing more.
What do you think?
Misc
reverse
Not sure yet ;-)AQL functions
"We have three different libraries and Brazil should just create AQL query strings and nothing more."
I think it might have to do more then "just query strings". Otherwise I don't really see how you can generated parameterized queries (or however they are called in Arango). Since bind parameters are not part of the query (the sole reason why they prevent nasty AQL injections), you have to handle that somehow too.
But maybe I'm overlooking something :) Interesting discussion otherwise, I'm looking forward to where this leads :)
Yeah, you're right. With "just AQL query string" I really wanted to say "The AQL query string with those bind parameters". This could look something like this:
Brazil::Collection.new(:user).find { |u| u[:name] == "David Bowman" }
This could generate something like this:
{
query: "FOR u IN @@users_collection FILTER u.name == @name RETURN u",
bind_parameters: { users_collection: 'users', name: 'David Bowman' }
}
This can be passed to ArangoDB via Ashikawa. Thanks for the clarification.
Array of Hashes
Thank you for your kind words :wink: But there's also another reason why we should not call it an Array: There will be parts of the API of the Array that we won't implement, and there will be parts that we will add to it that are not part of the Array. I suggest to just call it a Collection and say in the description that is behaves similar to an Array. What do you think?
Joins
I think that product
should behave as I described as this is close to the way it works in AQL and also in the product
method of Array. What exactly are you proposing? Given I have blog posts (having a title
and a author_id
) and a and I have authors (having a name
). Should it result in the following hash:
{
title: 'The Big Lebowski',
author: {
name: 'The Coen Brothers'
}
}
Puh. I don't know. I think that is the task of the map
call. What if I only want to use certain information of the author? Or put them directly into the hash and not as a 'sub hash'? Then I have to fiddle around a lot – compared to just having two hashes. That makes me think that we should probably really split the join
from the product
. The join
is what you describe and merges those two together using a provided key – it is really, really useful in a lot of cases. But if you want to get fiddly and build new things than you can use product
which will result in two hashes.
Talking to the database
You and your Proxies :laughing: Ok, that's good. I think that would be fine for Guacamole, but I don't get how it solves the problem of using it standalone. You say that we should add some functionality into Ashikawa::Core for that – but that would mean that Ashikawa::Core gets Brazil as a dependency. Or what exactly do you mean?
I don't want Brazil to know about the database at all. It gets an object and all it expects is that it has a method execute
that takes an AQL String and a hash of bind parameters and .to_a
returns whatever this method returns. I think that's a quite small and really reasonable interface that it expects. Guacamole could for example pass in something that already does the conversion to the right model.
Tisba's Remark
Good point! Yep, totally agree. to_aql
should return a tupel (either as a hash or as an array, I don't care) of the query string with the @
s and @@
s and the bind parameters. to_a
takes this into consideration as described above.
More
Array of Hashes
Get your point, but in this case let's call it just a Collection of Documents, which provide an API similar to Array and Hash :wink: In the end they are both not really an Array or Hash but just a way to build a nice API.
Joins
Ok, what about this: Within Guacamole you get a join
method which will give you this:
UsersCollection.joins(:posts)
And this will construct the following Brazil statement:
Brazil::Collection.new(:users).product(Brazil::Collection.new(:posts)) { |u, p| p[:user_key] == u[:key] }.map { |user, post| [user, post] }
With this Guacamole must still hold a reference to the underlying Cursor to iterate over the posts for each user. This reference must be passed as the association proxy.
In some cases you will want eager loading (aka: includes
):
UsersCollection.includes(:posts)
The Brazil will eventually look something like this:
Brazil::Collection.new(:users).map do |user, post|
user.merge({
posts: Brazil::Collection.new(:posts).find do |p|
p[:user_key] == user[:key]
end
})
end
I still like the idea of providing filters with impact on the product as a block to the product
method. But I will not make a drama if we don't do this ;-)
After all we should really focus on one thing here. For me this would be ease of use from Guacamole. I don't care about how messy nifty queries could be become without the help of Guacamole as long as we hit the 80/20 target with Guacamole itself.
Talking to the database
I don't understand why you want to put that feature in Brazil. If someone wants to use this without Guacamole that's easily possible:
aql_tuple = Brazil::Collection.new(:users).find { |user| user[:name] == "David Bowman" }.to_aql
cursor = ashikawa_connection.execute aql_tuple.aql_string, { bind_vars: aql_tuple.bind_parameters }
cursor.each
That's not a lot of boilerplate to add. Adding this to Brazil isn't that much effort either, but will still add functionality and tests which need to be maintained. I just don't want to do this, because as I said before: Focus should be on Guacamole.
Ok, I think this is ready for a new spike :wink: Looking forward to it.
:laughing:
@railsbros-dirk and me discussed this today face to face :wink:
We will try to mimic the API of Array & Enumerable as close as possible. Basically a collection in ArangoDB is a big array with a lot of hashes in it. Let's give it a try.
Return all documents of the characters collection:
A projection via
map
:Filtering via
find_all
orselect
:Sorting via
sort_by
:Limit via
first
,drop
andslice
:Pagination via
each_slice
:Joins via
product
(to join more collections, just add more to theproduct
call):