tarrade / proj_multilingual_text_classification

Explore multilingal text classification using embedding, bert and deep learning architecture
Apache License 2.0
5 stars 2 forks source link

Store tf.data after the transformation needed for BERT in TFRecord files (preprocessing done only one) #35

Closed tarrade closed 4 years ago

tarrade commented 4 years ago

first implementation is done

separate notebook for preprocessing is here: https://github.com/tarrade/proj_multilingual_text_classification/blob/master/notebook/02-Preprocessing/01_SST2_Huggingface_preprocesing.ipynb

separate notebok for model training reading TFRecord file is here: https://github.com/tarrade/proj_multilingual_text_classification/blob/master/notebook/03-Models/01_SST2_Huggingface_model.ipynb

assert cardinality doesn't exist with TF 2.1.0, waiting for 2.2.0 to clean up the code and made it a bit easier.

Big discussion about how to keep track of metadata with tf.data and TFRecord file ?

tarrade commented 4 years ago

this is done:

since this is all working fine -> closing