richarddwang / electra_pytorch

Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
324 stars 41 forks source link

Custom Dataset #37

Closed asparius closed 1 year ago

asparius commented 1 year ago

I am trying to train on a custom dataset however I can not process the dataset. Mapping gives this error " Column to remove ['validation'] not in the dataset. Current columns in the dataset: ['text']". I am using the below code as similar to other datasets. Could you give a working example of a custom dataset like the one I am using? babylm = datasets.load_dataset("asparius/babylm-10m","all.txt") e_babylm = ELECTRAProcessor(babylm).map(num_proc=1)