Explore multilingal text classification using embedding, bert and deep learning architecture
5
stars
2
forks
source link
Store tf.data after the transformation needed for BERT in TFRecord files (preprocessing done only one) #35
Closed
tarrade closed 4 years ago
first implementation is done
separate notebook for preprocessing is here: https://github.com/tarrade/proj_multilingual_text_classification/blob/master/notebook/02-Preprocessing/01_SST2_Huggingface_preprocesing.ipynb
separate notebok for model training reading TFRecord file is here: https://github.com/tarrade/proj_multilingual_text_classification/blob/master/notebook/03-Models/01_SST2_Huggingface_model.ipynb
assert cardinality doesn't exist with TF 2.1.0, waiting for 2.2.0 to clean up the code and made it a bit easier.
Big discussion about how to keep track of metadata with tf.data and TFRecord file ?