Closed jordane95 closed 8 months ago
Again, I think this bug relates to the corner case where one worker is idle and did nothing in the for loop before to change the exhausted_ranges status...
Fixed by PR https://github.com/huggingface/datatrove/pull/73
I follow the instructions in the code to use the script in this repo for building suffix array and generate byterange. But I get the following error when running step3.