sambanova / generative_data_prep

Apache License 2.0
58 stars 7 forks source link

Speedup Multiprocessing - decrease wait time for shared variable locks #115

Open snova-zoltanc opened 1 week ago

snova-zoltanc commented 1 week ago

It looks like as we scale the number of workers, the tokenization time does decrease linearly with number of workers like it should. Lets look into how much time is being wasted by these locks and try to minimize that

image