salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 393 forks source link

Added missing test for java conversions #334

Closed wsuchy closed 5 years ago

codecov[bot] commented 5 years ago

Codecov Report

Merging #334 into master will increase coverage by 0.07%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #334      +/-   ##
==========================================
+ Coverage   86.55%   86.63%   +0.07%     
==========================================
  Files         335      335              
  Lines       10751    10755       +4     
  Branches      354      565     +211     
==========================================
+ Hits         9306     9318      +12     
+ Misses       1445     1437       -8
Impacted Files Coverage Δ
...ala/com/salesforce/op/features/types/package.scala 57.93% <100%> (+6.86%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9fee4d4...40b46bb. Read the comment docs.

wsuchy commented 5 years ago

My assumption was that this test should cover only the basic functionality (what it does by now). By looking at the coverage you can clearly see that all lines are now covered. I don't think that exploding the scope of it makes much sense as there are also different things to focus on.

tovbinm commented 5 years ago

well, the code that was introduced in #333 now clearly has a problem which I wanted you discover through the test:

import com.salesforce.op.features.types._
import collection.JavaConverters._

val jm = new java.util.HashMap[String, java.lang.Boolean]()
jm.put("nullValue", null)
jm.toBinaryMap // NullPointerException !
tovbinm commented 5 years ago

I propose to add a test for null values and modify the code from #333 with following:

def toBinaryMap: BinaryMap = new BinaryMap(
    Option(v).map(_.asScala.collect { case (k, v) if v != null => v.booleanValue() }.toMap).getOrElse(Map.empty)
)
wsuchy commented 5 years ago

I am aware of that and assumed that you are fine with this by merging my PR.

tovbinm commented 5 years ago

Clearly we need to fix it as we don't want to have any NPE flying around.

wsuchy commented 5 years ago

Hmmm I wouldn't say that nulls must be implicitly omitted. Why not convert them to false in Boolean and 0 in Integer? Or maybe leave exceptions as a suggestion for the user that there is a problem with the data?

tovbinm commented 5 years ago

Replacing with default values would be even worse. Let's have then explicit handling for null values as follows, e.g. for RealMap:

def toRealMap: RealMap = new RealMap(
    Option(v).map(_.asScala.map {
      case (k, null) => (k, null.asInstanceOf[Double])
      case (k, x) => (k, x.doubleValue())
    }.toMap).getOrElse(Map.empty)
)
salesforce-cla[bot] commented 5 years ago

Thanks for the contribution! Unfortunately we can't verify the commit author(s): Matthew m***@s***.com. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

salesforce-cla[bot] commented 4 years ago

Thanks for the contribution! Before we can merge this, we need @wsuchy to sign the Salesforce.com Contributor License Agreement.