Closed christaina closed 6 years ago
Hello Christina,
In the new config you've posted above, in your coordinate-configurations
you use a shard named myShard
, while in the feature-shard-configurations
you define the shard with name rmShard
.
However, I don't think that that should result in the error you're getting. Could you please post more of the stacktrace? Thank you.
Yeah, sorry about that it was just a typo. Okay, here are some examples:
When using: --feature-shard-configurations='name=myShard,feature.bags=,intercept=true'
:
2017-11-21T18:54:29.248+0000 [ERROR] Failure while running the driver
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
at org.apache.hadoop.fs.Path.<init>(Path.java:134)
at org.apache.hadoop.fs.Path.<init>(Path.java:93)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$$anonfun$4.apply(NameAndTermFeatureSetContainer.scala:103)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$$anonfun$4.apply(NameAndTermFeatureSetContainer.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
at scala.collection.SetLike$class.map(SetLike.scala:92)
at scala.collection.AbstractSet.map(Set.scala:47)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$.readNameAndTermFeatureSetContainerFromTextFiles(NameAndTermFeatureSetContainer.scala:102)
at com.linkedin.photon.ml.cli.game.GameDriver$class.prepareFeatureMapsDefault(GameDriver.scala:190)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.prepareFeatureMapsDefault(GameTrainingDriver.scala:47)
at com.linkedin.photon.ml.cli.game.GameDriver$class.prepareFeatureMaps(GameDriver.scala:228)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.prepareFeatureMaps(GameTrainingDriver.scala:47)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$3.apply(GameTrainingDriver.scala:278)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$3.apply(GameTrainingDriver.scala:278)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.run(GameTrainingDriver.scala:277)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.main(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver.main(GameTrainingDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
at org.apache.hadoop.fs.Path.<init>(Path.java:134)
at org.apache.hadoop.fs.Path.<init>(Path.java:93)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$$anonfun$4.apply(NameAndTermFeatureSetContainer.scala:103)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$$anonfun$4.apply(NameAndTermFeatureSetContainer.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
at scala.collection.SetLike$class.map(SetLike.scala:92)
at scala.collection.AbstractSet.map(Set.scala:47)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$.readNameAndTermFeatureSetContainerFromTextFiles(NameAndTermFeatureSetContainer.scala:102)
at com.linkedin.photon.ml.cli.game.GameDriver$class.prepareFeatureMapsDefault(GameDriver.scala:190)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.prepareFeatureMapsDefault(GameTrainingDriver.scala:47)
at com.linkedin.photon.ml.cli.game.GameDriver$class.prepareFeatureMaps(GameDriver.scala:228)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.prepareFeatureMaps(GameTrainingDriver.scala:47)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$3.apply(GameTrainingDriver.scala:278)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$3.apply(GameTrainingDriver.scala:278)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.run(GameTrainingDriver.scala:277)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.main(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver.main(GameTrainingDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
When using --feature-shard-configurations="name=myShard,feature.bags='',intercept=true"
2017-11-21T18:56:04.091+0000 [ERROR] Failure while running the driver
java.io.FileNotFoundException: No such file or directory 's3://bucket/features/'''
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:816)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1194)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:773)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:166)
at com.linkedin.photon.ml.util.IOUtils$.readStringsFromHDFS(IOUtils.scala:171)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$.com$linkedin$photon$ml$data$avro$NameAndTermFeatureSetContainer$$readNameAndTermSetFromTextFiles(NameAndTermFeatureSetContainer.scala:121)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$$anonfun$4.apply(NameAndTermFeatureSetContainer.scala:104)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$$anonfun$4.apply(NameAndTermFeatureSetContainer.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
at scala.collection.SetLike$class.map(SetLike.scala:92)
at scala.collection.AbstractSet.map(Set.scala:47)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$.readNameAndTermFeatureSetContainerFromTextFiles(NameAndTermFeatureSetContainer.scala:102)
at com.linkedin.photon.ml.cli.game.GameDriver$class.prepareFeatureMapsDefault(GameDriver.scala:190)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.prepareFeatureMapsDefault(GameTrainingDriver.scala:47)
at com.linkedin.photon.ml.cli.game.GameDriver$class.prepareFeatureMaps(GameDriver.scala:228)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.prepareFeatureMaps(GameTrainingDriver.scala:47)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$3.apply(GameTrainingDriver.scala:278)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$3.apply(GameTrainingDriver.scala:278)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.run(GameTrainingDriver.scala:277)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.main(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver.main(GameTrainingDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.io.FileNotFoundException: No such file or directory 's3://bucket/features/'''
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:816)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1194)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:773)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:166)
at com.linkedin.photon.ml.util.IOUtils$.readStringsFromHDFS(IOUtils.scala:171)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$.com$linkedin$photon$ml$data$avro$NameAndTermFeatureSetContainer$$readNameAndTermSetFromTextFiles(NameAndTermFeatureSetContainer.scala:121)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$$anonfun$4.apply(NameAndTermFeatureSetContainer.scala:104)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$$anonfun$4.apply(NameAndTermFeatureSetContainer.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
at scala.collection.SetLike$class.map(SetLike.scala:92)
at scala.collection.AbstractSet.map(Set.scala:47)
at com.linkedin.photon.ml.data.avro.NameAndTermFeatureSetContainer$.readNameAndTermFeatureSetContainerFromTextFiles(NameAndTermFeatureSetContainer.scala:102)
at com.linkedin.photon.ml.cli.game.GameDriver$class.prepareFeatureMapsDefault(GameDriver.scala:190)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.prepareFeatureMapsDefault(GameTrainingDriver.scala:47)
at com.linkedin.photon.ml.cli.game.GameDriver$class.prepareFeatureMaps(GameDriver.scala:228)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.prepareFeatureMaps(GameTrainingDriver.scala:47)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$3.apply(GameTrainingDriver.scala:278)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$3.apply(GameTrainingDriver.scala:278)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.run(GameTrainingDriver.scala:277)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.main(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver.main(GameTrainingDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
It should work if you completely remove feature-bags-directory
, which has become optional now. Unless you're using the feature bags directory files as feature whitelists, you should stop including it, as the AvroDataReader
will automatically create a feature index. Alternatively, you can create an off-heap index map for re-use.
I'll talk with the team about what changes we should make so that others don't run into this problem. Thanks for bringing this to our attention!
Yes, we are using feature-bags-directory
as feature whitelists (sorry, this example was not the most clear of our actual use). We will explore those options. Thanks for your help!
No problem - please update us with your results. I'll leave this issue open, as a reminder that we need to take a look at how feature whitelists are handled when used with intercept-only models (with or without other models).
Hey Alex,
So I tried to omit feature-bags-directory
and using --feature-shard-configurations='name=myShard,feature.bags=,intercept=true'
, feature.bags=''
, and feature.bags=NONE
and got this error every time:
2017-11-21T19:48:18.953+0000 [ERROR] Failure while running the driver
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 13, ip-172-16-194-127.ec2.internal, executor 3): java.lang.IllegalArgumentException: Expected feature list to be a Java List, found instead: scala.None$.
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$readFeaturesFromRecord$2.apply(AvroDataReader.scala:295)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$readFeaturesFromRecord$2.apply(AvroDataReader.scala:293)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.linkedin.photon.ml.data.avro.AvroDataReader$.readFeaturesFromRecord(AvroDataReader.scala:293)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7$$anonfun$apply$2.apply(AvroDataReader.scala:242)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7$$anonfun$apply$2.apply(AvroDataReader.scala:241)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7.apply(AvroDataReader.scala:241)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7.apply(AvroDataReader.scala:240)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1569)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1557)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1556)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1556)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:815)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:815)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:815)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1784)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1739)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1728)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:631)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at com.linkedin.photon.ml.data.avro.AvroDataReader.generateIndexMapLoaders(AvroDataReader.scala:248)
at com.linkedin.photon.ml.data.avro.AvroDataReader.readMerged(AvroDataReader.scala:124)
at com.linkedin.photon.ml.data.avro.AvroDataReader.readMerged(AvroDataReader.scala:93)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.com$linkedin$photon$ml$cli$game$training$GameTrainingDriver$$readTrainingData(GameTrainingDriver.scala:384)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$4.apply(GameTrainingDriver.scala:281)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$4.apply(GameTrainingDriver.scala:281)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.run(GameTrainingDriver.scala:280)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$$anonfun$main$1.apply(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.util.Timed$.measureDuration(Timed.scala:71)
at com.linkedin.photon.ml.util.Timed$.apply(Timed.scala:57)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver$.main(GameTrainingDriver.scala:697)
at com.linkedin.photon.ml.cli.game.training.GameTrainingDriver.main(GameTrainingDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: Expected feature list to be a Java List, found instead: scala.None$.
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$readFeaturesFromRecord$2.apply(AvroDataReader.scala:295)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$readFeaturesFromRecord$2.apply(AvroDataReader.scala:293)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.linkedin.photon.ml.data.avro.AvroDataReader$.readFeaturesFromRecord(AvroDataReader.scala:293)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7$$anonfun$apply$2.apply(AvroDataReader.scala:242)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7$$anonfun$apply$2.apply(AvroDataReader.scala:241)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7.apply(AvroDataReader.scala:241)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7.apply(AvroDataReader.scala:240)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Hello Christina,
After looking at your latest stacktrace, it looks like this is actually a CLI parsing bug - I've created PR #322 to resolve it.
Great! Thank you!
Hello Christina,
Could you please give it another try with the latest Photon ML code? PR #322 has been merged into the master branch.
Hope it is okay to pitch in since I have been following this thread and encountered a similar issue last week. With the latest code, I think the issue still persists:
When I do not specify any value for feature.bags=,
spark2-submit \
--class com.linkedin.photon.ml.cli.game.training.GameTrainingDriver \
--master yarn \
--deploy-mode client \
--num-executors 4 \
--executor-cores 4 \
--driver-memory 10g \
--executor-memory 10g \
photon-all_2.11-1.0.0.jar \
--application-name "GAME Mixed Effect Model" \
--input-data-directories hdfs:///user/nisha/Data/photon-ml/train/ \
--input-column-names response=rating,uid=userId,offset=offset,weight=weight,metadataMap=metadataMap \
--validation-evaluators RMSE \
--root-output-directory hdfs:///user/nisha/Data/photon-ml/output/mixed2 \
--override-output-directory true \
--feature-shard-configurations "name=userShard,feature.bags=,intercept=true" \
--coordinate-configurations "name=perUser,random.effect.type=userId,feature.shard=userShard,min.partitions=5,optimizer=TRON,max.iter=10,tolerance=1e-3,regularization=L2,reg.weights=1,active.data.bound=10000000,passive.data.bound=1,features.to.sample.ratio=1.0" \
--data-validation VALIDATE_DISABLED \
--training-task LINEAR_REGRESSION \
--validation-data-directories hdfs:///user/nisha/Data/photon-ml/test \
--output-mode BEST \
--coordinate-update-sequence perUser \
--coordinate-descent-iterations 10 \
--normalization NONE \
--data-summary-directory "hdfs:///user/nisha/Data/photon-ml/output/mixed-training-smry2" \
--compute-variance true
stacktrace:
17/11/27 14:53:16 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, bottou01.sjc.cloudera.com, executor 3): java.lang.IllegalArgumentException: Expected feature list to be a Java List, found instead: scala.None$.
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$readFeaturesFromRecord$2.apply(AvroDataReader.scala:295)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$readFeaturesFromRecord$2.apply(AvroDataReader.scala:293)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.linkedin.photon.ml.data.avro.AvroDataReader$.readFeaturesFromRecord(AvroDataReader.scala:293)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7$$anonfun$apply$2.apply(AvroDataReader.scala:242)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7$$anonfun$apply$2.apply(AvroDataReader.scala:241)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7.apply(AvroDataReader.scala:241)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7.apply(AvroDataReader.scala:240)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
With feature.bags=NONE
17/11/27 14:59:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, bottou04.sjc.cloudera.com, executor 3): java.lang.IllegalArgumentException: Expected feature list NONE to be a Java List, found instead: scala.None$.
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$readFeaturesFromRecord$2.apply(AvroDataReader.scala:295)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$readFeaturesFromRecord$2.apply(AvroDataReader.scala:293)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.linkedin.photon.ml.data.avro.AvroDataReader$.readFeaturesFromRecord(AvroDataReader.scala:293)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7$$anonfun$apply$2.apply(AvroDataReader.scala:242)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7$$anonfun$apply$2.apply(AvroDataReader.scala:241)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7.apply(AvroDataReader.scala:241)
at com.linkedin.photon.ml.data.avro.AvroDataReader$$anonfun$7.apply(AvroDataReader.scala:240)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Hello Nisha,
Sorry - that was my mistake, I forgot to mention one thing: please remove the reference to feature.bags
. So your feature shard configuration would look like:
--feature-shard-configurations "name=userShard,intercept=true"
Thanks Alex, that works!
On a related note, with the updated API is it still possible to specify multiple feature shards when creating a mixed model? May be something like below (although this won't work)
--feature-shard-configurations "name=userShard,feature.bags=genreFeatures|movieLatentFeatures,intercept=true", "name=globalShard,feature.bags=genreFeatures|movieLatentFeatures,intercept=true"
And later of course specify the corresponding fixed and random effect configurations? I could do this programmatically just having some trouble while passing the arguments through the spark-submit.
Hey--I was able to make it work with the update PR as well. Thanks!
Nisha, from what I've seen you can specify multiple feature shard by using the flag --feature-shard-configurations
multiple times separately
Ah, thank you @christaina!
Thanks for bringing this issue to our attention and helping us resolve it, @christaina and @nishamuktewar.
That's correct: for feature shard and coordinate configurations, multiple configs are now specified by repeating the feature-shard-configurations
and coordinate-configurations
flags respectively, each time defining a new shard/coordinate.
Hi,
I am not sure how to create an intercept-only model after the recent changes to the package. Previously, this could be done with the arguments:
With the new changes, it seems that something similar should be accomplished with
However, this fails with
Further, any value that I pass for
feature.bags
will fail if it is not a file underfeature-bags-directory
. Is there a different way to create an intercept-only model? It should be possible based on the unit tests.Thanks!