Closed zhangir-azerbayev closed 1 year ago
Hi~ zhangir-azerbayev, Thanks for your attention! The full MetaMathQA dataset will be opened very soon because we are still using this data for fully fine-tuning the 70B model. Currently, A 40K subset of our Full MetaMathQA dataset is released in the huggingface MetaMathQA.
I also see that there are now two datasets in data/train
called MetaMath-40K_split1.json and MetaMath-40K_split2.json. What is the difference between these two files?
What is the timeline for releasing the full dataset?
Hi~ zhangir-azerbayev, Thanks again for your attention ! MetaMath-40K_split1.json and MetaMath-40K_split2.json are just the split datasets (each contains 20K).
I anticipate that the full dataset will likely be released in October, assuming there are no unforeseen obstacles. I also hope that we can release the full MetaMathQA dataset as soon as possible.
Hi~ zhangir-azerbayev, Thanks again for your attention !
MetaMath-40K_split1.json and MetaMath-40K_split2.json are just the split datasets (each contains 20K).
I anticipate that the full dataset will likely be released in October, assuming there are no unforeseen obstacles. I also hope that we can release the full MetaMathQA dataset as soon as possible.
In that case, I would strongly suggest editing the preprint to state that release of the full dataset is forthcoming. Even if it is just a preprint, I think it is wrong to claim standards of reproducibility that aren't yet actually met.
Hi~ zhangir-azerbayev, Thanks again for your attention ! MetaMath-40K_split1.json and MetaMath-40K_split2.json are just the split datasets (each contains 20K). I anticipate that the full dataset will likely be released in October, assuming there are no unforeseen obstacles. I also hope that we can release the full MetaMathQA dataset as soon as possible.
In that case, I would strongly suggest editing the preprint to state that release of the full dataset is forthcoming. Even if it is just a preprint, I think it is wrong to claim standards of reproducibility that aren't yet actually met.
Hi~ zhangir-azerbayev,
I apologize for the inconvenience. We encountered some minor issues while preparing to release the data a few days ago. Nevertheless, we are committed to releasing all of our data either today or tomorrow without fail! Furthermore, we will be enhancing the GitHub repository and providing code usage instructions. I will notify you as soon as these updates are in place. Thank you once again for your attention!
Thank you very much for your efforts @yulonghui! BTW, can you also publish the dataset generation code along with the complete dataset? #1
Hi~ zhangir-azerbayev, Thanks again for your attention ! MetaMath-40K_split1.json and MetaMath-40K_split2.json are just the split datasets (each contains 20K). I anticipate that the full dataset will likely be released in October, assuming there are no unforeseen obstacles. I also hope that we can release the full MetaMathQA dataset as soon as possible.
In that case, I would strongly suggest editing the preprint to state that release of the full dataset is forthcoming. Even if it is just a preprint, I think it is wrong to claim standards of reproducibility that aren't yet actually met.
Hi~ zhangir-azerbayev, Thanks again for your attention ! The full MetaMathQA dataset is now released in the huggingface MetaMathQA!
Thank you very much for your efforts @yulonghui! BTW, can you also publish the dataset generation code along with the complete dataset? #1
Hi~ imoneoi Thanks again for your attention ! The full MetaMathQA dataset is now released in the huggingface MetaMathQA! Also, we will clean up our generation code and update Arxiv soon! The code now is uncleaned
Thank you very much for your efforts @yulonghui! BTW, can you also publish the dataset generation code along with the complete dataset? #1
Hi~ imoneoi Thanks again for your attention ! The full MetaMathQA dataset is now released in the huggingface MetaMathQA! Also, we will clean up our generation code and update Arxiv soon! The code now is uncleaned
@yulonghui Thanks! Looking forward
The preprint states that you "release the MetaMathQA dataset". However, the huggingface dataset is empty, nor is the data in this repository.