nathanmarz / cascalog

Data processing on Hadoop without the hassle.
Other
1.38k stars 178 forks source link

IllegalArgumentException: fieldDeclaration must be the same size as the given values #216

Closed dkincaid closed 9 years ago

dkincaid commented 10 years ago

On upgrading to Cascalog 2.0 one query that uses cascalog-checkpoint is throwing the following exception:

13/11/26 08:56:45 ERROR checkpointed-workflow: Component failed
java.lang.IllegalArgumentException: fieldDeclaration must be the same size as the given values
    at cascalog.ops.KryoInsert.<init>(KryoInsert.java:21)
    at cascalog.cascading.operations$insert_STAR_$fn__5580.invoke(operations.clj:105)
    at cascalog.cascading.operations$each$fn__5554.invoke(operations.clj:64)
    at clojure.lang.AFn.applyToHelper(AFn.java:161)
    at clojure.lang.AFn.applyTo(AFn.java:151)
    at clojure.core$apply.invoke(core.clj:619)
    at clojure.core$update_in.doInvoke(core.clj:5587)
    at clojure.lang.RestFn.invoke(RestFn.java:445)
    at cascalog.cascading.operations$add_op.invoke(operations.clj:42)
    at cascalog.cascading.operations$each.invoke(operations.clj:60)
    at cascalog.cascading.operations$insert_STAR_.doInvoke(operations.clj:105)
    at clojure.lang.RestFn.applyTo(RestFn.java:139)
    at clojure.core$apply.invoke(core.clj:619)
    at cascalog.cascading.operations$insert_subs.invoke(operations.clj:680)
    at cascalog.cascading.operations$with_constants.invoke(operations.clj:687)
    at cascalog.cascading.operations$logically.invoke(operations.clj:738)
    at cascalog.cascading.platform$assem_STAR_$fn__7224.invoke(platform.clj:76)
    at cascalog.cascading.platform$eval7293$fn__7294.invoke(platform.clj:129)
    at clojure.lang.MultiFn.invoke(MultiFn.java:241)
    at cascalog.cascading.platform$eval7454$fn__7456.invoke(platform.clj:219)
    at cascalog.cascading.platform$eval7370$fn__7371$G__7361__7376.invoke(platform.clj:202)
    at cascalog.cascading.platform$compile_query$fn__7477.invoke(platform.clj:305)
    at cascalog.logic.zip$postwalk_edit.doInvoke(zip.clj:56)
    at clojure.lang.RestFn.invoke(RestFn.java:494)
    at cascalog.cascading.platform$compile_query.invoke(platform.clj:303)
    at cascalog.cascading.platform$eval7487$fn__7488.invoke(platform.clj:312)
    at cascalog.cascading.types$eval5409$fn__5410$G__5400__5415.invoke(types.clj:35)
    at cascalog.cascading.operations$add_op.invoke(operations.clj:42)
    at cascalog.cascading.operations$rename_pipe.invoke(operations.clj:76)
    at cascalog.cascading.operations$in_branch.invoke(operations.clj:595)
    at cascalog.cascading.operations$write_STAR_.invoke(operations.clj:602)
    at clojure.core$comp$fn__409.invoke(core.clj:2332)
    at clojure.core$map$fn__470.invoke(core.clj:2492)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core.protocols$seq_reduce.invoke(protocols.clj:26)
    at clojure.core.protocols$eval2802$fn__2803.invoke(protocols.clj:53)
    at clojure.core.protocols$eval2735$fn__2736$G__2726__2749.invoke(protocols.clj:13)
    at clojure.core$reduce.invoke(core.clj:6175)
    at cascalog.logic.algebra$sum.invoke(algebra.clj:26)
    at cascalog.cascading.flow$compile_flow.doInvoke(flow.clj:91)
    at clojure.lang.RestFn.applyTo(RestFn.java:137)
    at clojure.core$apply.invoke(core.clj:619)
    at cascalog.api$_QMARK__.doInvoke(api.clj:153)
    at clojure.lang.RestFn.invoke(RestFn.java:436)
    at com.idexx.lambda.hadoop.jobs.patientvisits.summary$launch_workflow$fn__10967.invoke(summary.clj:52)
    at cascalog.checkpoint$mk_runner$fn__8338.invoke(checkpoint.clj:60)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:724)
chetmancini commented 10 years ago

@dkincaid Thanks for opening. I'm seeing this same issue come up running my tests.

Quantisan commented 10 years ago

do you have the query or, better yet, a unit test?

dkincaid commented 10 years ago

The query that I think it throwing the exception is in the Gist at https://gist.github.com/dkincaid/4235b4a4aaa5f95ba6cf

Once you've had a chance to look it over I'll need to delete it. Thanks.

Quantisan commented 10 years ago

that's one long query, could you extract out the problem into a unit test please?

Quantisan commented 10 years ago

it's just difficult for me to eyeball the problem since I can't run the code

dkincaid commented 10 years ago

Ok. I'll see if I can make it fail with a simpler query.

Quantisan commented 10 years ago

I think I know the problem. This is caused when a mapfn fails so the query builder doesn't know how many fields it outputs. In particular, get-in doesn't seem to work because of #217

@dkincaid could you confirm if your stripped down query with just get-in throws the same exception please?

mping commented 10 years ago

I'm hitting the same error running this simple gist on the repl: https://gist.github.com/mping/7931708#comment-968513 Using cascalog 2.0.0.

Quantisan commented 10 years ago

@mping @dkincaid this should be fixed by pull request #223, could you give that patch a try please?

mping commented 10 years ago

I'm still hitting the same error :\ I'm a total n00b so I could be doing something wrong. I cloned the cascalog project, build it and installed it locally using maven; I double checked the classpath of my proj and it shows cascalog-2.0.1-SNAPSHOT so I'm guessing it's set up correctly.

Quantisan commented 10 years ago

@mping could you write a test case so I can reproduce the error and help you debug it please?

mping commented 10 years ago

Sorry for the late reply. Test case is here: https://gist.github.com/mping/7931708#comment-968513, I hit it with that piece or using similar code. Let me know if you need more info

sritchie commented 9 years ago

Okay, looks like this is fixed:

(def line
  [[(->> ["contentHost" "my.site.org"
           "contentKeywords" "something"
           "contentPath" "/loader.php" "contentReferer"  "http://www.mysite.org/hr"    "contentTitle"    "myunited"    "geoCountry"  "Croatia" "geoCountryId"    "HR"  "userAgentOsName" "Windows" "userAgentOsVersion"  "Windows 7"   "userAgentScreenResolution"   "1280x800"    "userAgentUaString"   "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31"]
          (clojure.string/join "\t" ))]])

(defn zipmap-fields [line] 
  (apply hash-map (clojure.string/split line #"\t")))

(defn field-values [line ks]
  (let [m (zipmap-fields line)]
    (for [k ks]
      (get m k))))

(??<- [?contentHost ?keywords]
      (line ?line)
      (field-values ?line ["contentHost" "contentKeywords"] :> ?contentHost ?keywords))
;;=> (["my.site.org" "something"])