Closed dkincaid closed 9 years ago
@dkincaid Thanks for opening. I'm seeing this same issue come up running my tests.
do you have the query or, better yet, a unit test?
The query that I think it throwing the exception is in the Gist at https://gist.github.com/dkincaid/4235b4a4aaa5f95ba6cf
Once you've had a chance to look it over I'll need to delete it. Thanks.
that's one long query, could you extract out the problem into a unit test please?
it's just difficult for me to eyeball the problem since I can't run the code
Ok. I'll see if I can make it fail with a simpler query.
I think I know the problem. This is caused when a mapfn fails so the query builder doesn't know how many fields it outputs. In particular, get-in
doesn't seem to work because of #217
@dkincaid could you confirm if your stripped down query with just get-in throws the same exception please?
I'm hitting the same error running this simple gist on the repl: https://gist.github.com/mping/7931708#comment-968513 Using cascalog 2.0.0.
@mping @dkincaid this should be fixed by pull request #223, could you give that patch a try please?
I'm still hitting the same error :\ I'm a total n00b so I could be doing something wrong. I cloned the cascalog project, build it and installed it locally using maven; I double checked the classpath of my proj and it shows cascalog-2.0.1-SNAPSHOT so I'm guessing it's set up correctly.
@mping could you write a test case so I can reproduce the error and help you debug it please?
Sorry for the late reply. Test case is here: https://gist.github.com/mping/7931708#comment-968513, I hit it with that piece or using similar code. Let me know if you need more info
Okay, looks like this is fixed:
(def line
[[(->> ["contentHost" "my.site.org"
"contentKeywords" "something"
"contentPath" "/loader.php" "contentReferer" "http://www.mysite.org/hr" "contentTitle" "myunited" "geoCountry" "Croatia" "geoCountryId" "HR" "userAgentOsName" "Windows" "userAgentOsVersion" "Windows 7" "userAgentScreenResolution" "1280x800" "userAgentUaString" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31"]
(clojure.string/join "\t" ))]])
(defn zipmap-fields [line]
(apply hash-map (clojure.string/split line #"\t")))
(defn field-values [line ks]
(let [m (zipmap-fields line)]
(for [k ks]
(get m k))))
(??<- [?contentHost ?keywords]
(line ?line)
(field-values ?line ["contentHost" "contentKeywords"] :> ?contentHost ?keywords))
;;=> (["my.site.org" "something"])
On upgrading to Cascalog 2.0 one query that uses cascalog-checkpoint is throwing the following exception: