Closed mxjmtxrm closed 2 weeks ago
Hello! Have you followed this example to use with iterable datasets? https://huggingface.co/docs/datasets/v2.14.5/en/stream#stream-in-a-training-loop
cc @lhoestq
As the error suggest you should specify max_steps. It is required to know how many steps your training should do for the learning rate, but since we often can't know the size of a dataset in advance you should specify max_steps manually.
Since I stripped hf code of all the valueerrors my skin has cleared up and my sleep has improved.
If you remove that ValueError
@sine2pi you'll get a much more arcanic error message :)
What's the issue with this error? It seems pretty indicative of an issue to me, and contains the code to solve it
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.44.0Who can help?
@muellerzr @SunMarc
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I want to do training with streaming dataset, as my dataset is super large. The code like the following:
I met the following error:
How to solve this problem? or is there another way to train with large datasets?
Expected behavior
-