linkedin / Avro2TF

Avro2TF is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks.
BSD 2-Clause "Simplified" License
126 stars 21 forks source link

demo data throws NoSuchElementException: No value found for 'sparseVector' when parsing json config file #66

Closed treper closed 4 years ago

treper commented 4 years ago

I use the default json file

{
  "features": [
    {
      "inputFeatureInfo": {
        "columnExpr": "userId"
      },
      "outputTensorInfo": {
        "name": "userId",
        "dtype": "long",
        "shape": [
          -1
        ]
      }
    },
    {
      "inputFeatureInfo": {
        "columnExpr": "movieId",
        "transformConfig": {
          "hashInfo": {
            "hashBucketSize": 1000,
            "numHashFunctions": 4
          }
        }
      },
      "outputTensorInfo": {
        "name": "movieId_hashed",
        "dtype": "long",
        "shape": [
          4
        ]
      }
    },
    {
      "inputFeatureInfo": {
        "columnExpr": "genreFeatures.term"
      },
      "outputTensorInfo": {
        "name": "genreFeatures_term",
        "dtype": "long",
        "shape": [
          -1
        ]
      }
    },
    {
      "inputFeatureInfo": {
        "columnConfig": {
          "genreFeatures": {
            "whitelist": [
              "Genre"
            ]
          },
          "movieLatentFactorFeatures": {
            "blacklist": [
              "0"
            ]
          }
        },
        "transformConfig": {
          "hashInfo": {
            "hashBucketSize": 100,
            "combiner": "AVG"
          }
        }
      },
      "outputTensorInfo": {
        "name": "genreFeatures_movieLatentFactorFeatures",
        "dtype": "SparseVector",
        "shape": []
      }
    }
  ],
  "labels": [
    {
      "inputFeatureInfo": {
        "columnExpr": "response"
      },
      "outputTensorInfo": {
        "name": "response",
        "dtype": "double",
        "shape": []
      }
    }
  ]
}

it throws the following exception:

Error: Option --avro2tf-config-path failed when given 'tensorizeIn_config_movielens.json'. java.util.NoSuchElementException: No value found for 'sparseVector'
    at scala.Enumeration.withName(Enumeration.scala:124)
    at io.circe.Decoder$$anonfun$enumDecoder$1$$anonfun$apply$22$$anonfun$apply$23.apply(Decoder.scala:1097)
    at io.circe.Decoder$$anonfun$enumDecoder$1$$anonfun$apply$22$$anonfun$apply$23.apply(Decoder.scala:1097)
    at scala.util.Try$.apply(Try.scala:192)
    at io.circe.Decoder$$anonfun$enumDecoder$1$$anonfun$apply$22.apply(Decoder.scala:1097)
    at io.circe.Decoder$$anonfun$enumDecoder$1$$anonfun$apply$22.apply(Decoder.scala:1096)
    at io.circe.Decoder$$anon$37.apply(Decoder.scala:438)
    at io.circe.Decoder$class.tryDecode(Decoder.scala:46)
    at io.circe.Decoder$$anon$37.tryDecode(Decoder.scala:437)
    at io.circe.Decoder$$anon$22.tryDecode(Decoder.scala:94)
    at com.linkedin.avro2tf.parsers.Avro2TFConfigParser$$anonfun$1$anon$importedDecoder$macro$190$1$$anon$25.configuredDecode(Avro2TFConfigParser.scala:31)
    at io.circe.generic.extras.decoding.ConfiguredDecoder$CaseClassConfiguredDecoder.apply(ConfiguredDecoder.scala:58)
    at io.circe.Decoder$class.tryDecode(Decoder.scala:46)
    at io.circe.generic.decoding.DerivedDecoder.tryDecode(DerivedDecoder.scala:6)
    at com.linkedin.avro2tf.parsers.Avro2TFConfigParser$$anonfun$1$anon$importedDecoder$macro$190$1$$anon$30.configuredDecode(Avro2TFConfigParser.scala:31)
    at io.circe.generic.extras.decoding.ConfiguredDecoder$CaseClassConfiguredDecoder.apply(ConfiguredDecoder.scala:58)
    at io.circe.Decoder$class.decodeJson(Decoder.scala:64)
    at io.circe.generic.decoding.DerivedDecoder.decodeJson(DerivedDecoder.scala:6)
    at io.circe.Parser$class.finishDecode(Parser.scala:13)
    at io.circe.config.parser$.finishDecode(parser.scala:64)
    at io.circe.config.parser$.decode(parser.scala:164)
    at io.circe.config.syntax$CirceConfigOps$.as$extension0(syntax.scala:176)
    at com.linkedin.avro2tf.parsers.Avro2TFConfigParser$$anonfun$1.apply(Avro2TFConfigParser.scala:31)
    at com.linkedin.avro2tf.parsers.Avro2TFConfigParser$$anonfun$1.apply(Avro2TFConfigParser.scala:31)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at com.linkedin.avro2tf.parsers.Avro2TFConfigParser$.getAvro2TFConfiguration(Avro2TFConfigParser.scala:31)
    at com.linkedin.avro2tf.parsers.Avro2TFJobParamsParser$$anon$1$$anonfun$11.apply(Avro2TFJobParamsParser.scala:208)
    at com.linkedin.avro2tf.parsers.Avro2TFJobParamsParser$$anon$1$$anonfun$11.apply(Avro2TFJobParamsParser.scala:179)
    at scopt.OptionDef$$anonfun$34.apply(options.scala:600)
    at scopt.OptionDef.applyArgument(options.scala:679)
    at scopt.OptionParser.scopt$OptionParser$$handleArgument$1(options.scala:444)
    at scopt.OptionParser.parse(options.scala:490)
    at com.linkedin.avro2tf.parsers.Avro2TFJobParamsParser$.parse(Avro2TFJobParamsParser.scala:359)
    at com.tencent.weishi.recall.DataFrame2TFRecord$.main(DataFrame2TFRecord.scala:169)
    at com.tencent.weishi.recall.DataFrame2TFRecord.main(DataFrame2TFRecord.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:727)
zhangxuhong commented 4 years ago

@treper Thanks for trying out Avro2TF, if you are using the tutorial, you might see some error, since it's out of dated. For the config, please follow the example in the README. sparseVector is not supported anymore.

mayiming commented 4 years ago

@treper We have upgraded our docker image in tutorial to reflect our recent feature changes. Please give a try.