likaixin2000 / MMCode

[EMNLP 2024] Multi-modal code generation problems.
15 stars 0 forks source link

Original images (png) #1

Open hj611 opened 2 months ago

hj611 commented 2 months ago

Thank you for providing a great data set. Will the original images (png) involved in the data set be provided in the future?

likaixin2000 commented 2 months ago

Thank you for your interest in our work. Currently the images are simply stored in base64 encoding in the datasets, which can be found here. I will upload the original files soon after the paper gets accepted.

On the other hand, you can also extract the images from the loading function (https://github.com/Happylkx/MMCode/blob/main/utils.py#L100), for example:

for i, image in enumerate(pil_images):
    image.save(f"{problem['problem_id']}/i.png")

Please feel free to reach out if you have any further questions!

likaixin2000 commented 1 month ago

Hi @hj611, our paper has been accepted to EMNLP 2024🎉. Please check out the original files of the dataset via Hugging Face Datasets.