rabeehk / compacter

126 stars 15 forks source link

How to change the size of the dataset? #8

Closed CaffreyR closed 2 years ago

CaffreyR commented 2 years ago

Hi @rabeehk , thanks for sharing the code. May I ask that how to change the size of dataset. It seems that that the COPA have 400 train dataset, 50 val dataset, 50 test dataset. I want to change a little bit of the size, but change the "max_train_samples","max_val_samples","max_test_samples" seems not to work.

CaffreyR commented 2 years ago

Actually I was from reading this code

rabeehk commented 2 years ago

Dear @CaffreyR I am not aware of the code you mentioned in the link shared. About my codes, I do not have access to run the codes at the moment, but I checked from codes and I think the parameters you mentioned should work. For debugging, you can put a breakpoints on line 387, 409, 429 in this file and make sure n_obs is passed to data loading part. Best Rabeeh

CaffreyR commented 2 years ago

Thanks, @rabeehk. Maybe is the cache problem! Then all the test&val dataset is directly download from the official website? How do I predict the dataset on test dataset? I mean to generate labels!

rabeehk commented 2 years ago

Hi @CaffreyR oh yes, I do remember getting this issue for some other project with huggingface datasets. I did not investigate the issue in that project, but I remember I was adding download_mode="force_redownload" to load_dataset function, terrible solution, but for now could possibly resolve the issue. for compacter codes I did not get this issue, not sure why it is occurring because I am explicitly sampling in this line https://github.com/rabeehk/compacter/blob/b210eef13f64ff6441186ee5a1cbf031b5918b94/seq2seq/data/tasks.py#L78 For generating labels, the current codes does not support it, and you need to modify the test prediction part in this line https://github.com/rabeehk/compacter/blob/b210eef13f64ff6441186ee5a1cbf031b5918b94/seq2seq/run_seq2seq.py#L548

CaffreyR commented 2 years ago

Hi @rabeehk , May I ask how to predict the label when testing?

rabeehk commented 2 years ago

@CaffreyR I think in this line https://github.com/rabeehk/compacter/blob/b210eef13f64ff6441186ee5a1cbf031b5918b94/seq2seq/third_party/trainers/trainer.py#L89 huggingface evaluation also returns back the predicted labels, and then you need to make sure to return back the predictions as well in this line https://github.com/rabeehk/compacter/blob/b210eef13f64ff6441186ee5a1cbf031b5918b94/seq2seq/third_party/trainers/trainer.py#L112 currently I think it only returns back metrics, but you can modify this line to make it return predictions as well

CaffreyR commented 2 years ago

Many thanks!