Guidance for deployment to new tasks?

guosyjlu / DS-Agent

Official implementation of "DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning" in ICML'24

108 stars 14 forks source link

Guidance for deployment to new tasks? #17

Open merlynmarc opened 6 days ago

merlynmarc commented 6 days ago

Hello,

I'd like to apply DS-Agent to tasks beyond the 18 that were included in the paper.

Are there any scripts or guidance for deploying to other Kaggle competitions? For example: https://www.kaggle.com/c/seizure-prediction/overview

Thank you, -marc

guosyjlu commented 6 days ago

Hi marc,

Please refer to FAQ (Q2) in README. I just updated the guidance there.

merlynmarc commented 6 days ago

Perfect, thanks Siyuan!

merlynmarc commented 4 days ago

Hi Siyuan,

How do I actually submit DS-Agent's submissions to the Kaggle leaderboard?

For example, I see the airline-reviews test.csv has the answers in it, which its submission.py uses to return the score.

Are these not actually being submitted to Kaggle? (Is this the case for the results in the paper?)

Thanks, -marc

guosyjlu commented 2 days ago

Yes, the implementation of this paper utilizes offline benchmark, which means the predicted results are evaluated offline with the predefined evaluation metric and ground-truth labels. If you want to utilize realistic feedback of Kaggle leaderboard scores, you can customize the submission.py via Kaggle APIs.

merlynmarc commented 16 hours ago

OK, thanks Siyuan.

merlynmarc commented 14 hours ago

Apologies. One more question: Where do you get the ground-truth labels? If I understand correctly, these are hidden by the competition organizers. Or are you reporting results on the validation set?

Thank you, -marc

guosyjlu commented 4 hours ago

We perform offline evaluation throughout the paper, which means the dataset is split into training, validation and testing set by ourselves. For Kaggle competitions that only release the training set, we futher split the training set into new training set, validation set and testing set.