megagonlabs / sato

Code and data for Sato https://arxiv.org/abs/1911.06311.
Apache License 2.0
108 stars 40 forks source link

How to increase pre-trained model accuracy for custom data #9

Open ankit-crossml opened 4 years ago

ankit-crossml commented 4 years ago

I used your demo and pre-trained model for one of my attached sample table but the result are not much promising. It detected all columns as "address".

I face the same problem with the sherlock model (which you are referring too).

Sato_Demo

So what is the best way to do transfer learning and train a more promising model on custom data?

Being a Deep Learning engineer we can work collaboratively to improve this model and repository.

A quick reply will be really appreciated.

horseno commented 3 years ago

Sorry about the late reply. This is a bug we inherited from Sherlock due to the use of dictionary-based word-embeddings. Columns like "id" have values that do not exist in the Glove dictionary we used to extract the feature. This leads to undefined values "NaN" in word embedding features, and they are later propagated into the network. Then the model falls back to predicting type 0 which is "address". We've pushed a simple fix to convert undefined values and the demo has been updated.