nathanmarz / cascalog

Data processing on Hadoop without the hassle.
Other
1.38k stars 178 forks source link

fix for ops that return nil #289

Open bfabry opened 9 years ago

bfabry commented 9 years ago

In memory platform was blowing up for an op that returned a single nil value, due to (jackknife.seq/collectify nil) => []

sritchie commented 9 years ago

Works for me. Can we also test that no tuples are returned if you use ? logic vars?

bfabry commented 9 years ago

Yup, I've added that test and it failed, and I've come up with my version of a fix. It's a bit gross though so let me know if there's a "more correct" way to do it. Now if I can just figure out how to get gh to add the amended commit...

bfabry commented 9 years ago

Oh, it did it automatically

sritchie commented 9 years ago

Yeah, I don't really like that - the nil filtering should just happen as a matter of course, since the presence of ? variables should insert a nil filter. Wonder why that's not happening.

bfabry commented 9 years ago

Oh cool, I assumed it would have. wonder why it's not firing...

bfabry commented 9 years ago

@sritchie could you give me a pointer as to where/how the nil filtering should happen? I've been walking through this code today and am struggling.

sritchie commented 9 years ago

@bfabry hey, sorry, was out of town helping out at a running race.

This is the piece of code that does the filtering: https://github.com/nathanmarz/cascalog/blob/develop/cascalog-core/src/clj/cascalog/cascading/operations.clj#L732

you can see in this search where that gets called:

https://github.com/nathanmarz/cascalog/blob/ba49f30947bebd4ec92e29e92985f91fd071c75a/cascalog-core/src/clj/cascalog/cascading/platform.clj#L83

If you search platform.clj for (assem you'll get all the hits.

Let me know if that helps!

bfabry commented 8 years ago

@sritchie those are both part of the cascading platform, but the bug is with the in-memory platform?

sritchie commented 8 years ago

@bfabry yeah, you're right - that code snippet doesn't really mesh with my comment. I had forgotten that that code was at the Cascading level, then lost context when pasting the link. I think if we can modify this pull request to act like that Cascading code, then we can apply it to map, mapcat and filter, which should fix the tests.