query per nested collection

olivergeorge commented 8 years ago

I'll do more on this but wanted to start a thread in case you have feedback

Here's vaguely what I had in mind for supporting recursion.

Few notes:

Basic premise is "one query per nested collection". That may involve more queries than joining but should be more efficent in terms of data transferred. Not sure that's what you want which is fine... I don't mind being told this is a bad idea.
Each query needs to be given a where clause which filters the possible records to those needed. Options are: use original query as subquery WHERE id in (query) or pass in ids WHERE id in [ids]
The method signature might be a bit complex. Aimed to work with the om parser but it's incomplete.

(defn pull-all
  [{:keys [db-state query] :as env} resource {:keys [where]}]
  (let [db-config (:config db-state)
        expanded-relations (expand-relations query resource db-state)
        {:keys [has-many belongs-to]} (group-by :relation-type expanded-relations)
        query (build-query resource where belongs-to db-state)
        collection (db-query (:config db-state) query)
        collection (distinct collection)]
    (loop [has-many (seq has-many)
           ret (collection-with-associations collection resource expanded-relations db-state)]
      (if-not (nil? has-many)
        (let [rel (first has-many)
              res (-> rel :being-added :resource)
              ;subq (assoc query :select [:id])
              subq (mapv :id collection)
              subc (pull-all env res {:where [[:id :in subq]]})
              ret (stitch-up ret res subc)]
          (recur (next has-many) ret))
        ret))))

That's really pseudocode

needs to handle query containing keys and maps (for joins)
need to implement stitch-up

olivergeorge commented 8 years ago

Quick update. This provides a zipper based solution for translating a nested relational query defined as an Om Next query expression into flat sql queries.

https://gist.github.com/olivergeorge/963701655779675288cd

netizer commented 8 years ago

I wouldn't say that "one query per nested collection" is a bad idea, but I think user of the library should be able to make this choice. I think that if the library will generate adequate SQL queries, initial setup of the back-end will be super easy, but tuning performance will become much more complex.

Now I'm not saying that this library should put performance as the top priority. I really like back-ends expressed with few lines of code, but I'd like this library to give the developer tools to improve performance. There's no such feature here yet, but that's where I'd like this library to evolve.

I was thinking about something like this: There are 3 functions:

query - it just fetches the data from the database (with or without joins)
select-ids - it fetches ids from the formatted result for another query call ,
format-response - formats rows from the relational database call into the desired format

;; that means :users have :posts and :attachments, :posts have :author, 
;; and :attachments have :files and :author. 
;; now we still use db-state with `associations`, so we can figure out what is an array and what is a hash.
(def result-format [:users [:posts :author] [:attachments :files :author]])

(let [users (query db-state :users #{:posts :attachments} [[:= :users.id 1] [:posts.published true]])
      users-formatted (format-response db-state result-format users)
      missing-ids (select-ids users-formatted [:users :posts :attachments])
      missing-records (query db-state :files #{} [[:= files.attachments_id missing-ids]
      result (format-response db-state result-format missing-records users-formatted)]
  (println result))

format-response would need additional attribute to deal with ambiguity but I think it might be optional.

Now there might be a function find-all:

(find-all db-state result-format [[:= :users.id 1] [:posts.published true]])

that would follow "one query per nested collection" rule, but if in some cases it wouldn't be the best fit, query, select-ids and format-response make it easy to fetch data differently.

netizer commented 8 years ago

I tried the zipper based solution to understand better how it works, but it seems it didn't work for me as it suppose to, becasue that was the output:

SELECT [:name {:meeting [:jurisdiction {:race [:id]}]}] FROM :venue
Merge records into data nil
SELECT nil FROM :jurisdiction
Merge records into data nil
{:venue {}, :jurisdiction {}}

olivergeorge commented 8 years ago

Hello. Sorry about that, I'll try and fix up the gist this evening (+3h)

olivergeorge commented 8 years ago

I think I agree with your points above.

It would be enough to support/facilitate "one query per nested collection" but is better not to require it.

It's a good idea to allow control over how queries are grouped. (Not sure how that might work, my closest reference points are perhaps django's select_related / prefetch_related features or perhaps the LINQ include feature)

I think perhaps the current find-all parameters would need to evolve to cater for nesting and query performance hints. Possibly conditions becomes a map with :where for conditions and optionally :prefetch etc for hints. The relations param could expand to allow maps for nested recursion (joins) as per the Datomic pull spec.

olivergeorge commented 8 years ago

I was able to repeat your results with a schema of {}. ~~(Still a bug, will investigate)~~ Result is two queries, one for venue including data from meeting, jurisdiction and race tables and a second top level query for jurisdiction.

(parser {:schema {}}
        [{:venue [:name
                  {:meeting [:jurisdiction
                             {:race [:id]}]}]}
         :jurisdiction])

Try this

(parser {:schema {:race    {:meeting      :belongs-to
                            :logged-event :has-many}
                  :meeting {:race  :has-many
                            :venue :belongs-to}
                  :venue   {:meeting :has-many}}}
        [{:venue [:name
                  {:meeting [:jurisdiction
                             {:race [:id]}]}]}
         :jurisdiction])

Should produce:

SELECT [:name] FROM :venue
Merge records into data nil
SELECT [:jurisdiction] FROM :meeting
Merge records into data (:venue :meeting)
SELECT [:id] FROM :race
Merge records into data (:venue :meeting :race)
SELECT nil FROM :jurisdiction
Merge records into data nil

olivergeorge commented 8 years ago

I've updated that script now. gist has source and sample output from examples. It includes two ideas for consideration:

use of :include param to control where queries get split
ability for pull spec to include other query modifiers (additional filters via :where param)

This stuff is quite superficial, the hard bit is building queries and stitching things back up.

olivergeorge commented 8 years ago

Quick update, I've done a walker based implementation which is more advanced.

https://gist.github.com/olivergeorge/b9f5fb2935e10d75389fdc081c8027a5

netizer commented 8 years ago

Awesome :-)

one question, how do you execute this kind of gists? I mean I would:

create a lein project
put the whole code in test file
add call to example function at the end of the test file
figure out dependencies and fill them in in project.clj
run test but maybe there is a tool that I don't know about that makes the process easier?

olivergeorge commented 8 years ago

Hi. I'm not sure how to do that. I was resisting creating an example github project but I see it would be more convenient. (I believe there is a lein plugin which can execute a single file with requirements defined at the top... might be worth exploring).

Anyhow, I've put my experiment on github now:

https://github.com/condense/shred

netizer commented 8 years ago

Perfect, I think it's this one: https://github.com/mtyaka/lein-oneoff Thanks for creating the repo though. It makes it even easier :-)

netizer / relational_mapper

query per nested collection #3