n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
MIT License
284 stars 56 forks source link

Always labels are tokenizing instead of text column, Kindly fix the issue facing #75

Open suryapa1 opened 4 years ago

suryapa1 commented 4 years ago

exp = multifit.from_pretrained("de_multifit_paper_version") cls_dataset = exp.arch.dataset(Path('data/de_sentiment'), exp.pretrain_lm.tokenizer) cls_dataset.load_clas_databunch(bs=exp.finetune_lm.bs).show_batch()

data/de_sentiment , path has train.csv/test.csv with labels, text as columns, even by shuffling as well show batch is tokenizing, Not sure why it is populaitng as such, any help is greatly apprecisted.,

My problem statement that is trying is as follows: 1) Get german pretrained using multifit.from_pretrained("de_multifit_paper_version") 2) create custom classifer dataset and fine tune on top of german pretrained 3) classify custom dataset

Any example is greatly appreciated as well,

suryapa1 commented 4 years ago

Updated screenshot:

Screenshot 2020-06-15 at 11 02 29 PM