Open andy-innersource opened 6 years ago
CF
returned where you'd expect multiple ones to return?lazy?
case we return a seq
, I found it weird to have a map
as result otherwise1. in shell:
hbase(main):001:0> create "galaxy:planet","stat","satellite"
Created table galaxy:planet
in repl:
user=> (store conn "galaxy:planet" "earth" "stat" {:inhabited? true :population 7125000000 :age "4.543 billion years"})
nil
user=> (store conn "galaxy:planet" "earth" "satellite" {:artificial ["spy1" "comm1"] :natural ["moon"]})
nil
user=> (scan conn "galaxy:planet")
{"earth" {:artificial ["spy1" "comm1"], :natural ["moon"]}}
whole stat
CF is missing in the result
while my version returns
user=> (scan* conn "galaxy:planet")
[["earth" {:satellite {:artificial ["spy1" "comm1"], :natural ["moon"]}, :stat {:age "4.543 billion years", :inhabited? true, :population 7125000000}}]]
(defn hdata->map* [^Result data]
(when-let [r (.getRow data)]
(into {} (for [[k vs] (.getNoVersionMap data)]
[(keyword (String. ^bytes k))
(into {} (for [[kv vv] vs]
[(keyword (String. ^bytes kv))
(@unpack vv)]
))]
))
))
(defn results->maps* [results row-key-fn]
(for [^Result r results]
[(row-key-fn (.getRow r))
(hdata->map* r)]))
(defn scan* [conn table & {:keys [row-key-fn limit with-ts? lazy?] :as criteria}]
(with-open [^HTableInterface h-table (get-table conn table)]
(let [results (-> (.iterator (.getScanner h-table (scan-filter criteria)))
iterator-seq)
row-key-fn (or row-key-fn #(String. %))
rmap (results->maps* (if-not limit
results
(take limit results))
row-key-fn)]
(cond->> rmap
(not with-ts?) (without-ts)
(not lazy?) (into [])))))
ah, good point. besides it should be a vector of maps, not vector of vectors as per 2.
has to do with the fact that there is no lazy map, but there is a lazy seq: i.e. we can't return a lazy map
of results.
I agree that lazy?
is inconsistent by being a vector rather than a map, but it is best to work with maps since they are explicitly addressable / navigatable.
lazy was added later and I remember thinking about returning chunk maps from lazy?
(i.e. stream it). there might be different ideas, but maps are far more superior to work with than vectors of vectors.
Of course, I see the value of returning map
but IMHO I would split scan
into two functions offering different semantics and get rid of the lazy?
thing. lazy?
implies longs sequences. I do have indeed two use cases in my code: quick lookups into a map as well as scanning thru millions of rows.
"1." is bigger deal for me. I can live with my version of scan*
for now though (it is quick and dirty, waiting for an ultimate solution)
yea, feel free to submit a pull request. as long as it does not break the existing API (besides returning results for multiple CFs) I see no reason it would not be merged
Hi,
I have been using this library for a while and it is awesome. However, there are a few things which either could be improved or complicate life a lot.
The original code for some reason in scan only returns a single CF. I use following code in my version to return all CFs
I wonder how we could extend API to have above capability?
Also, in
scan
itself we prefer vectors are resulting structureA.