load_dataset() doesn't work without specifying dataset_revision?

nuprl / MultiPL-E

A multi-programming language benchmark for LLMs

https://nuprl.github.io/MultiPL-E/

Other

201 stars 38 forks source link

Closed ShushanArakelyan closed 3 months ago

ShushanArakelyan commented 1 year ago

Hello!

Loading any section of the dataset like this:

from datasets import load_dataset
d = load_dataset('nuprl/MultiPL-E', 'humaneval-lua', download_mode='force_redownload')

results in ExpectedMoreDownloadedFiles error, but using revision number from your completions.py works:

d = load_dataset('nuprl/MultiPL-E', 'humaneval-lua', download_mode='force_redownload', revision = "bf4f3c31a1e0a164b7886c9eb04f82534edf4ce9")

is this intended?

thanks a lot in advance!

arjunguha commented 1 year ago

Odd. This is not intended. We'll take a look soon.

arjunguha commented 3 months ago

This should now be fixed.