Open tokenatlas opened 11 months ago
It might be the case that the provided dataset contains only small samples. Setting the number of splits as 1 solved the issue.
https://github.com/imoneoi/openchat/blob/master/ochat/data/generate_dataset.py#L128
@tokenatlas did u resolve this issues or found any work around? I am facing same issue
Thanks for open-sourcing this!
I am trying to follow the instructions for tokenizing the data, but it fails with the stack trace below. I'm just using two lines of dummy data. Any ideas where this issue is coming from? Thanks!