Some questions about reproducing

Hi! Sorry for the delayed response. ad 1, I think you do not need the header, otherwise it looks good. Or maybe labels need to be "0", and "1", I do not recall. Anyway, it was the default format that fastai library downloads the IMDB dataset in. ad 2, the model code sits in classifiers.py and sequence_aggregations.py. It is just the classifier head, as the fastai default encoder is used. ad 3, I'd just starts with https://github.com/tpietruszka/ulmfit_experiments/blob/master/example_configs/imdb_full_agg_1.json and change the dataset path. aggregation_class can stay the same for sure, you might need to modify numers of epochs and learning rates in each phase... but I think this is reasonable as a starting point.

Overall though, this repo is more up to date, more cleaned-up: https://github.com/tpietruszka/ulmfit_attention so I would start there.

tpietruszka / ulmfit_experiments

Some questions about reproducing #1