linkedin / photon-ml

A scalable machine learning library on Apache Spark
Other
792 stars 185 forks source link

Remove legacy Photon code #456

Open ashelkovnykov opened 4 years ago

ashelkovnykov commented 4 years ago
ashelkovnykov commented 4 years ago

@cmjiang @yunboouyang @lguo Updated for the latest master, requesting review

joshvfleming commented 4 years ago

Regarding:

Modify AvroDataReader and AvroUtils to load features as NameTermValueAvro objects, instead of GenericRecord objects and then expecting to find certain fields by name

IIRC, the reason we did it this way was that a lot of teams were still using NameTermValueAvro-like schemas to encode feature values, but not exactly NameTermValueAvro.

ashelkovnykov commented 4 years ago

@joshvfleming My proposed changes (in AvroUtils) are actually specifically looking out for this case. I ran into an issue where if a GenericRecord matched the input schema, but it wasn't the exact output schema, then the cast would fail. So now, we look for all of the schema fields by name but in a generic way where the names aren't hard-coded like before.