linkedin / photon-ml

A scalable machine learning library on Apache Spark
Other
792 stars 185 forks source link

Value modelsRDD in class RandomEffectModel cannot be accessed #448

Open mindis opened 5 years ago

mindis commented 5 years ago

Hi

I'm trying to replicate code in lme4-photon-sleep.zip in https://github.com/linkedin/photon-ml/issues/374 using Apache Toree notebook (directly accessing PhotonML API) Spark version 2.3.4 with scala 2.11.8 and latest Photon version jar.

I get error when trying to save random effects from model in the following step

val dx_re = mixedModel.toMap.get("rand").get match {
      case m: RandomEffectModel =>
        val reModelRDD = m.modelsRDD
        val re_tup = reModelRDD.map(x => (x._1, x._2.coefficients.means.toArray.zipWithIndex))
        val re_flat = re_tup.flatMap{
          case (store, arr) => {
            arr.map(cell => (store, cell))
          }
        }
        val coeffNames = rand_effects.columns.slice(1,rf_cols+2)
        val re_flat_col = re_flat.map(store_tup => (store_tup._1, coeffNames.apply(store_tup._2._2), store_tup._2._1))

        val dx_re = spark.createDataFrame(re_flat_col).selectExpr("_1 as store", "_2 as column", "_3 as coeff")
        dx_re
    }
dx_re.show()

Message: :82: error: value modelsRDD in class RandomEffectModel cannot be accessed in com.linkedin.photon.ml.model.RandomEffectModel Access to protected value modelsRDD not permitted because enclosing class $iw is not a subclass of class RandomEffectModel in package model where target is defined val reModelRDD = m.modelsRDD

Is this something related to my environment or changes in Photon ML API?

If related to Photon ML how can I see/save random effects without using ModelProcessingUtils.saveGameModelToHDFS?

Problem with ModelProcessingUtils.saveGameModelToHDFS is that it requires inputIndexMaps that is generated by AvroReader that I'd like to avoid as my data is in csv.

thanks

ashelkovnykov commented 5 years ago

@mindis You should be able to access the raw RDD of models, we should loosen the permissions on the RandomEffectModel. I'll make a PR and reference this issue.

mindis commented 5 years ago

Thanks, that solved the issue!

Maybe related, I was getting same "Access to protected" error when trying to instantiate GameDatum for scoring part in the code below.

import org.apache.spark.sql.Row
val data_prep = df.rdd.map{case Row(id: Int, response: Double, grouping:Int, fixed:org.apache.spark.ml.linalg.SparseVector, rand:org.apache.spark.ml.linalg.SparseVector)
                           =>
      val breeze_fixed = fixed match{
    case sv:org.apache.spark.ml.linalg.SparseVector => new breeze.linalg.SparseVector[Double](sv.indices, sv.values, sv.size)
  }
  val breeze_rand = rand match{
    case sv:org.apache.spark.ml.linalg.SparseVector => new breeze.linalg.SparseVector[Double](sv.indices, sv.values, sv.size)
  }
    (id.toLong, new com.linkedin.photon.ml.data.GameDatum(
    response = 1.0,
    offsetOpt = None,
    weightOpt = None,
    featureShardContainer = Map("fixed" -> breeze_fixed, "rand" -> breeze_rand),
    idTagToValueMap = Map("grouping" -> "grouping")
  )
   )
}

To make it work I had to modify to GameDatum.scala file to remove protected[ml] in line 38 protected[ml] class GameDatum

thanks