mozilla-extensions / regrets-reporter

The RegretsReporter browser extension, built by the nonprofit Mozilla, lets you flag regrettable recommendations on YouTube.
https://foundation.mozilla.org/en/youtube/regretsreporter/
37 stars 11 forks source link

Add batched tokenization #131

Closed aapot closed 2 years ago

aapot commented 2 years ago

Changed dataset tokenization to batched & multiprocessing tokenization. Depending on your hardware, the dataset tokenization speedup can be substantial. For example, at Colab tokenizing 7.5K rows of data takes now about 20 seconds instead of 17 minutes previously 🚀🚀