nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
200 stars 38 forks source link

Translated Prompts In Hugging Face not complete #148

Closed 49Simon closed 3 months ago

49Simon commented 3 months ago

Hi,

The prompts folder has been recently deleted stating they're not needed anymore and can be referenced from hugging face hub. Most prompts in the github had different doctests variations (like reworded, transform, keep, remove), however, in the hub there seems to be the transform doctest only. Is there a specific reason this is selected? Do you have plans to add the other doctests to the dataset?

Thanks.

arjunguha commented 3 months ago

Sorry to cause a problem. You should be able to get to them from hugging face like this:

datasets.load_dataset("nuprl/MultiPL-E", revision="d23b094346c5dbda1080a74bb2a24c18adbf7409", trust_remote_code=True)

Please let me know if that doesn't work.

49Simon commented 3 months ago

With the code you suggested, it downloaded a json with this info: {"url": "https://raw.githubusercontent.com/nuprl/MultiPL-E/11b407bd2dd98c8204afea4d20043faf2145c20c/prompts/humaneval-cpp-reworded.json", "etag": null}

This is humaneval reworded cpp prompts. How can one download a specific language, specific benchmark and a specific variation?

arjunguha commented 3 months ago
datasets.load_dataset(
    "nuprl/MultiPL-E",
    SUBSET_NAME,
    revision="d23b094346c5dbda1080a74bb2a24c18adbf7409",
    trust_remote_code=True)