mesolitica / malaysian-dataset

We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
https://malaysian-dataset.readthedocs.io/
Apache License 2.0
297 stars 106 forks source link

Add dataset for benchmarking purpose #402

Closed azrilhafizi closed 3 months ago

azrilhafizi commented 4 months ago

This dataset contains over 1,000 questions and answers on Malay language grammar, tailored for primary school students aged 7 to 12. The questions are scraped from various sources on the Internet. This dataset is designed to improve the benchmarking of LLMs in the Malay language.

huseinzol05 commented 3 months ago

hey! this is super niceeeeee!

huseinzol05 commented 3 months ago

this is much better than our benchmarked dataset!