hpcaitech / ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI
Apache License 2.0
334 stars 102 forks source link

Personal Dataset Preprocessing #24

Closed Lobskodax closed 2 years ago

Lobskodax commented 2 years ago

If I want to use my own dataset to train the gpt-2 model, the format is TXT, with one sentence per line, how can I modify the data preprocessing code to make it match and run normally.

FrankLeeeee commented 2 years ago

Hi, did you figure out how?

FrankLeeeee commented 2 years ago

Hi, I will close this issue for now. If you have difficulty build your own dataset, welcome to re-open this issue. Thanks~