Open demongolem-biz2 opened 10 months ago
You should use datasets.load_dataset
instead of nlp.load_dataset
, as the nlp
package is outdated.
If switching to datasets.load_dataset
doesn't fix the issue, sharing the JSON file (feel free to replace the data with dummy data) would be nice so that we can reproduce it ourselves.
Describe the bug
I have 127 elements in my input dataset. When I do a len on the dataset after loaded, it is only 124 elements.
Steps to reproduce the bug
Both train and valid input are 127 items. However, they both only load 124 items. The input format is in json. At the end of the day, I am trying to create .pt files.
Expected behavior
I see all 127 elements in my dataset when performing len
Environment info
Python 3.10. CentOS operating system. nlp==0.40, datasets==2.14.5, transformers==4.26.1