mfenniak / rethinkdb-net

A C# / .NET client driver for RethinkDB.
Other
247 stars 37 forks source link

Filter queries containing sub-queries #248

Open funlambda opened 8 years ago

funlambda commented 8 years ago

I'm trying to figure out how to represent a filter query which contains a sub-query on another table. Here's my query in JS, which works as expected:

r.table('TableA').filter(function (a) {
  return r.table('TableB').filter(function (b) {
    return b('ignoredID').eq(a('id'));
  }).count().eq(0);
});

Is it currently possible to write this query using the .NET API? If not, is there another way I can accomplish this?

mfenniak commented 8 years ago

Hm... I'm not 100% sure how this query actually runs. It would appear to me that, for every row in 'TableA', it's going to perform a separate query in 'TableB'. That seems like a pretty performance-intensive approach?

I think the approach I would take for this specific query would be to use a join. Since you're looking for values that aren't matched in the two tables, I think an outer join and then a filter on the joined table's fields being null would accomplish the same thing. I can't provide a working specific example code right now, but it would start with something like (https://github.com/mfenniak/rethinkdb-net/blob/5853690123ff54f70beb014d7c72fbea515b0d1e/rethinkdb-net-test/Integration/MultiTableTests.cs#L104), and then add a filter condition for where the join resulted in nulls.

funlambda commented 8 years ago

Thanks for the quick response! I agree that using an outer join is more appropriate and efficient in this scenario. I rewrote the query as you suggested, and it works well now. I'm running into another issue now (see #250).

However, there may be other situations where doing a subquery makes more sense than joins. For example, what if you need to return records in TableA that are referenced by at least 5 records in TableB created within some date range? Yes, you can probably still do that with join + filter + group + aggregate + filter again, but at a significant cost to query readability IMO. A sub-query would much more clearly express the intent in that scenario. I've dealt with even more complex query requirements in SQL where the equivalent join query is prohibitively difficult to write, and some situations where sub-queries are the only possible way.

As far as performance goes, I've seen similar and sometimes better performance for sub-query queries vs. join queries for particularly complex queries in MS SQL Server (assuming DB is has correct indexes). I have no idea how performance would compare in RethinkDB though.

I suspect that these scenarios are much rarer when using Document DBs (vs SQL DBs) since parent-child relationships can be expressed within a document as opposed to only through relations between tables. However, they still may happen occasionally, so it would be nice to be able to do so through the RethinkDB .NET API.

Is there a way to fall back to Javascript ReQL queries when something can't be represented via the .NET API? Or some other raw query representation that can be used?