twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

implicit conversion from TraversableOnce to Fields is dangerous #1657

Open johnynek opened 7 years ago

johnynek commented 7 years ago

Note, inside of Job there is an implicit from TraverseableOnce to Fields:

https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/FieldConversions.scala#L185

Note, .isDefined is a method on Option in scala, but not on TraversableOnce/List/Seq. It is an easy mistake to make to call .isDefined on such a type, which would normally be an error but in Job you convert to Fields which does have this method: http://docs.cascading.org/cascading/1.2/javadoc/cascading/tuple/Fields.html#isDefined()

and is DEFINITELY NOT what you intend so you likely get a silent error.

My proposal is to remove these implicits and require explicit calls in those cases. I think they are rare. We NEVER use fields API, so we don't mind. But I assume at Twitter a small number of jobs may be using that implicit so you would need to make some changes to accept that PR. In the worst case you could put the implicit back in those jobs, but probably they are bad style to begin with and we should use a tuple or new Fields("foo", "bar", "baz") or something explicit.

Thoughts?

@piyushnarang @benpence @sriramkrishnan ?

piyushnarang commented 7 years ago

I think dropping these implicits sounds good to me. I could do a dry run internally to see how many affected customers there are.

isnotinvain commented 7 years ago

That sounds good to me, I think we'd catch any issues at compile time and can clean them up as needed.

piyushnarang commented 7 years ago

Tried running a sandbox with these changes and it looks like we have a decent number of failures (~25 or so targets). A bunch of our ads jobs are on the fields API and seem to be relying on the fields(...) / intFields(...) conversions in a bunch of places in their jobs. If we are going ahead with this change, it might be cleaner on our end to add a FieldConversionImplicits object internally which these jobs can import which defines the old implicit methods. @johnynek if you don't mind us pushing this to the release after 0.17 it might make our life simpler as we already have a lot of fixes to make as part of that (it includes the algebird 0.13.0 bump with all the changes to the algebra types and hence we need to bump all our libraries together).

johnynek commented 7 years ago

👍