Since we have some categorical variables in our dataset, we've had to one-hot encode them. This results in the creation of sparse matrices in our data. E.g., if we have a column for vehicle make, it might be able to take on the values "Honda" / "Toyota" / "Chevrolet" / etc.
After one-hot encoding, we now have a dataset that looks like
Basically, for most of the columns across a single observation, the values will overwhelmingly be 0. This generally does bad things to our model accuracy, so we will use embedding columns as a dimensionality reduction technique to turn these sparse matrices into dense ones. Google offers a good overview of these at https://developers.googleblog.com/2017/11/introducing-tensorflow-feature-columns.html
Estimated time: 3 hours
Since we have some categorical variables in our dataset, we've had to one-hot encode them. This results in the creation of sparse matrices in our data. E.g., if we have a column for vehicle make, it might be able to take on the values "Honda" / "Toyota" / "Chevrolet" / etc.
After one-hot encoding, we now have a dataset that looks like
Basically, for most of the columns across a single observation, the values will overwhelmingly be 0. This generally does bad things to our model accuracy, so we will use embedding columns as a dimensionality reduction technique to turn these sparse matrices into dense ones. Google offers a good overview of these at https://developers.googleblog.com/2017/11/introducing-tensorflow-feature-columns.html