Changed dataset tokenization to batched & multiprocessing tokenization. Depending on your hardware, the dataset tokenization speedup can be substantial. For example, at Colab tokenizing 7.5K rows of data takes now about 20 seconds instead of 17 minutes previously 🚀🚀
Changed dataset tokenization to batched & multiprocessing tokenization. Depending on your hardware, the dataset tokenization speedup can be substantial. For example, at Colab tokenizing 7.5K rows of data takes now about 20 seconds instead of 17 minutes previously 🚀🚀