netizer / relational_mapper

A relational mapper in Clojure
Eclipse Public License 1.0
14 stars 2 forks source link

query per nested collection #3

Open olivergeorge opened 8 years ago

olivergeorge commented 8 years ago

I'll do more on this but wanted to start a thread in case you have feedback

Here's vaguely what I had in mind for supporting recursion.

Few notes:

(defn pull-all
  [{:keys [db-state query] :as env} resource {:keys [where]}]
  (let [db-config (:config db-state)
        expanded-relations (expand-relations query resource db-state)
        {:keys [has-many belongs-to]} (group-by :relation-type expanded-relations)
        query (build-query resource where belongs-to db-state)
        collection (db-query (:config db-state) query)
        collection (distinct collection)]
    (loop [has-many (seq has-many)
           ret (collection-with-associations collection resource expanded-relations db-state)]
      (if-not (nil? has-many)
        (let [rel (first has-many)
              res (-> rel :being-added :resource)
              ;subq (assoc query :select [:id])
              subq (mapv :id collection)
              subc (pull-all env res {:where [[:id :in subq]]})
              ret (stitch-up ret res subc)]
          (recur (next has-many) ret))
        ret))))

That's really pseudocode

olivergeorge commented 8 years ago

Quick update. This provides a zipper based solution for translating a nested relational query defined as an Om Next query expression into flat sql queries.

https://gist.github.com/olivergeorge/963701655779675288cd

netizer commented 8 years ago

I wouldn't say that "one query per nested collection" is a bad idea, but I think user of the library should be able to make this choice. I think that if the library will generate adequate SQL queries, initial setup of the back-end will be super easy, but tuning performance will become much more complex.

Now I'm not saying that this library should put performance as the top priority. I really like back-ends expressed with few lines of code, but I'd like this library to give the developer tools to improve performance. There's no such feature here yet, but that's where I'd like this library to evolve.

I was thinking about something like this: There are 3 functions:

;; that means :users have :posts and :attachments, :posts have :author, 
;; and :attachments have :files and :author. 
;; now we still use db-state with `associations`, so we can figure out what is an array and what is a hash.
(def result-format [:users [:posts :author] [:attachments :files :author]])

(let [users (query db-state :users #{:posts :attachments} [[:= :users.id 1] [:posts.published true]])
      users-formatted (format-response db-state result-format users)
      missing-ids (select-ids users-formatted [:users :posts :attachments])
      missing-records (query db-state :files #{} [[:= files.attachments_id missing-ids]
      result (format-response db-state result-format missing-records users-formatted)]
  (println result))

format-response would need additional attribute to deal with ambiguity but I think it might be optional.

Now there might be a function find-all:

(find-all db-state result-format [[:= :users.id 1] [:posts.published true]])

that would follow "one query per nested collection" rule, but if in some cases it wouldn't be the best fit, query, select-ids and format-response make it easy to fetch data differently.

netizer commented 8 years ago

I tried the zipper based solution to understand better how it works, but it seems it didn't work for me as it suppose to, becasue that was the output:

SELECT [:name {:meeting [:jurisdiction {:race [:id]}]}] FROM :venue
Merge records into data nil
SELECT nil FROM :jurisdiction
Merge records into data nil
{:venue {}, :jurisdiction {}}
olivergeorge commented 8 years ago

Hello. Sorry about that, I'll try and fix up the gist this evening (+3h)

olivergeorge commented 8 years ago

I think I agree with your points above.

It would be enough to support/facilitate "one query per nested collection" but is better not to require it.

It's a good idea to allow control over how queries are grouped. (Not sure how that might work, my closest reference points are perhaps django's select_related / prefetch_related features or perhaps the LINQ include feature)

I think perhaps the current find-all parameters would need to evolve to cater for nesting and query performance hints. Possibly conditions becomes a map with :where for conditions and optionally :prefetch etc for hints. The relations param could expand to allow maps for nested recursion (joins) as per the Datomic pull spec.

olivergeorge commented 8 years ago

I was able to repeat your results with a schema of {}. (Still a bug, will investigate) Result is two queries, one for venue including data from meeting, jurisdiction and race tables and a second top level query for jurisdiction.

(parser {:schema {}}
        [{:venue [:name
                  {:meeting [:jurisdiction
                             {:race [:id]}]}]}
         :jurisdiction])

Try this

(parser {:schema {:race    {:meeting      :belongs-to
                            :logged-event :has-many}
                  :meeting {:race  :has-many
                            :venue :belongs-to}
                  :venue   {:meeting :has-many}}}
        [{:venue [:name
                  {:meeting [:jurisdiction
                             {:race [:id]}]}]}
         :jurisdiction])

Should produce:

SELECT [:name] FROM :venue
Merge records into data nil
SELECT [:jurisdiction] FROM :meeting
Merge records into data (:venue :meeting)
SELECT [:id] FROM :race
Merge records into data (:venue :meeting :race)
SELECT nil FROM :jurisdiction
Merge records into data nil
olivergeorge commented 8 years ago

I've updated that script now. gist has source and sample output from examples. It includes two ideas for consideration:

This stuff is quite superficial, the hard bit is building queries and stitching things back up.

olivergeorge commented 8 years ago

Quick update, I've done a walker based implementation which is more advanced.

https://gist.github.com/olivergeorge/b9f5fb2935e10d75389fdc081c8027a5

netizer commented 8 years ago

Awesome :-)

one question, how do you execute this kind of gists? I mean I would:

  1. create a lein project
  2. put the whole code in test file
  3. add call to example function at the end of the test file
  4. figure out dependencies and fill them in in project.clj
  5. run test but maybe there is a tool that I don't know about that makes the process easier?
olivergeorge commented 8 years ago

Hi. I'm not sure how to do that. I was resisting creating an example github project but I see it would be more convenient. (I believe there is a lein plugin which can execute a single file with requirements defined at the top... might be worth exploring).

Anyhow, I've put my experiment on github now:

https://github.com/condense/shred

netizer commented 8 years ago

Perfect, I think it's this one: https://github.com/mtyaka/lein-oneoff Thanks for creating the repo though. It makes it even easier :-)