Open ghost opened 8 years ago
What cascading version and cascalog versions are you using? This reminds me of an iterator bug we fixed a while ago.
— Sent from Mailbox
On Sat, Oct 31, 2015 at 6:15 PM, Timothy Galebach notifications@github.com wrote:
The following input on cascalog.playground:
(??- (<- [?p ?age] (age ?p ?age)))
returns
[["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ["luanne" 36] ]``` However, running ```clojure (?- (stdout) (<- [?p ?age] (p/age ?p ?age)))
gives the correct result (10 unique names and ages).
Reply to this email directly or view it on GitHub: https://github.com/nathanmarz/cascalog/issues/294
I'm using cascalog 2.1.1.
I haven't explicitly declared anything wrt cascading; I've just been following the project's readme to get started. Relevant portion of project.clj below:
:dependencies [[org.clojure/clojure "1.7.0"]
[cascalog "2.1.1"]]
:profiles { :dev {:dependencies [[org.apache.hadoop/hadoop-core "1.2.1"]]}}
:jvm-opts ["-Xms768m" "-Xmx768m"])
Yeah, this is fixed in 3.0.0-SNAPSHOT, which I think I the latest version off of master. Want to give that a shot? We're due for a new release for sure.
— Sent from Mailbox
On Sat, Oct 31, 2015 at 5:22 PM, Timothy Galebach notifications@github.com wrote:
I'm using cascalog 2.1.1. I haven't explicitly declared anything wrt cascading; I've just been following the project's readme to get started. Relevant portion of project.clj below:
:dependencies [[org.clojure/clojure "1.7.0"] [cascalog "2.1.1"]] :profiles { :dev {:dependencies [[org.apache.hadoop/hadoop-core "1.2.1"]]}} :jvm-opts ["-Xms768m" "-Xmx768m"])
Reply to this email directly or view it on GitHub: https://github.com/nathanmarz/cascalog/issues/294#issuecomment-152787525
Same issue occurs with these dependencies:
:dependencies [[org.clojure/clojure "1.7.0"]
[cascalog/cascalog-core "3.0.0-SNAPSHOT"]]
Is there a working project.clj I could take a look at? Once this gets resolved I'm guessing it will come down to a documentation issue, and I'm happy to submit a pull request for that. I also had some initial frustrations because the documentation doesn't mention needing to run (bootstrap-emacs) in cider, so that should probably be fixed as well.
For some reason my internet connection's preventing me from launching a repl (by blocking dependency downloads in leiningen), but I THINK, based on a different bug, I have a guess about what's causing this. Can you give this branch a try?
https://github.com/nathanmarz/cascalog/pull/295
Check out the discussion here: https://github.com/nathanmarz/cascalog/issues/251
Along with this fix: https://github.com/nathanmarz/cascalog/pull/280
for some more background on the issue. Also, Any updates on documentation you want to send over would be huge.
Trying that branch now, trying to build it and put in the local repo, but running into the issue that the sub-modules (cascalog-checkpoint, midje, etc) depend on cascalog-core, so I'm not able to compile them initially. I don't usually structure projects like this--how do you compile this structure?
Ah, sorry- first, run "lein sub install" in the base directory. Thanks for trying this out!
— Sent from Mailbox
On Sun, Nov 1, 2015 at 12:45 PM, Timothy Galebach notifications@github.com wrote:
Trying that branch now, trying to build it and put in the local repo, but running into the issue that the sub-modules (cascalog-checkpoint, midje, etc) depend on cascalog-core, so I'm not able to compile them initially. I don't usually structure projects like this--how do you compile this structure?
Reply to this email directly or view it on GitHub: https://github.com/nathanmarz/cascalog/issues/294#issuecomment-152868008
OK, that works for compilation/local repo installation. Unfortunately the bug still persists. If it's helpful, the log output in the repl says that Cascading 2.5.3 is being used currently.
Thanks for the help so far! Have a project I'm transitioning over to hadoop as it's grown a lot, and I'd really like to go with cascalog on it, so hopefully can sort this out.
This looks very related to #292. The folks over at that ticket figured out that this issue only shows up with Clojure 1.7.0.
OK, I'll try going back to 1.6, thanks!
That fixed it. I'm going to submit a pull request for docs that are a bit more current in a bit.
This just bit me as well; Can confirm that switching to 1.6 fixes the issue, but it would be nice to have a 1.7 compatible fix.
@metasoarous totally hear you. I'm happy to review any pull requests from folks who want to take this on! I'm not using Cascalog for my work these days, so I don't have time to fix bugs like this myself, but I am available on a consulting basis to fix bugs or add features.
Hi @sritchie: I appreciate the offer. Right now, 1.7 isn't critical for us, but if it becomes necessary we'll keep that in mind. I mostly just wanted to add a second data point for posterity's sake :-)
http://dev.clojure.org/jira/browse/CLJ-1738
1.7 Compatibility Notes: iterator-seq change, it could help ?
Direction of this ticket changed at Rich's request.
Prior description capture here:
Clojure code that uses iterator-seq to wrap Java iterators that return the same mutable object on every call are broken by the chunked iterator-seq changes from CLJ-1669.
Some examples where this occurs:
Hadoop ReduceContextImpl$ValueIterator Mahout DenseVector$AllIterator/NonDefaultIterator LensKit FastIterators Cause: In 1.6, the iterator-seq wrapper could be used with these to consume a sequence over these iterators element-by-element. In 1.7 RC1, iterator-seq produces a chunked sequence. Because next() is called 32 times on the iterator before the first value can be retrieved from the seq, and the same mutable object is returned every time, code doing this now receives different (incorrect) results.
Approach: Switch iterator-seq back to non-chunked and change eduction to use the chunking iterator-seq strategy as that was the original target. Retain the use of the chunked iterator seq in sequence over the TransformerIterator.
only ??- ??<- use iteraltor-seq
@nightlord this is really interesting, and probably the reason for the bug. Looks like a change like this may work:
(defn iter-seq [iter f]
(if (.hasNext iter)
(lazy-seq
(cons (f (.next iter))
(iter-seq iter f)))))
@sritchie it fix ??-, maybe not enough good, but sure it's problem. https://github.com/nathanmarz/cascalog/pull/296
@sritchie fix ??-, ci build problem, add profile 1.6,1.7.
build success.
The following input on cascalog.playground:
returns
gives the correct result (10 unique names and ages).