Too low accuracy result compared with the expected result

xtchon commented 9 months ago

Hi, thanks for your work. I'm trying to test out the result of your work but found some difficulties on reproducing similar accuracy results.

Below is the Environment that I created: channels:

default dependencies:
python=3.9.7
pip
pip:
- transformers==4.17.0
- scipy==1.7.3
- datasets==2.00.0
- scikit-learn==1.0.2
- torch==1.10.2
- black
- wandb
- matplotlib

I used datasets==2.00.0, cause when I install datasets==1.14.0, it would result the following conflict: The conflict is caused by: transformers 4.17.0 depends on huggingface-hub<1.0 and >=0.1.0 datasets 1.14.0 depends on huggingface-hub<0.1.0 and >=0.0.19

If I use datasets 2.00.0, it is able to run the evaluation.py MNLI ../CoFi-MNLI-s95, but the results seems wrong? What can I do to solve this problem? Thanks a lot!

../CoFi-MNLI-s95 is what is downloaded from https://huggingface.co/princeton-nlp/CoFi-MNLI-s95 Results I obtained: Task: mnli Model path: ../CoFi-MNLI-s95 Model size: 4330279 Sparsity: 0.949 accuracy: 0.091 seconds/example: 0.000531

Too low accuracy compared to the expected result: Task: MNLI Model path: princeton-nlp/CoFi-MNLI-s95 Model size: 4920106 Sparsity: 0.943 mnli/acc: 0.8055 seconds/example: 0.010151

xiamengzhou commented 9 months ago

I think using datasets==1.14.0 is necessary in this case to get the model performance right. Maybe you can skip the version conflict for now?

SHUSHENGQIGUI commented 2 months ago

Hi, thanks for your work. I'm trying to test out the result of your work but found some difficulties on reproducing similar accuracy results.

Below is the Environment that I created: channels:

default dependencies:

python=3.9.7

pip

pip:

transformers==4.17.0

scipy==1.7.3

datasets==2.00.0

scikit-learn==1.0.2

torch==1.10.2

black

wandb

matplotlib

I used datasets==2.00.0, cause when I install datasets==1.14.0, it would result the following conflict: The conflict is caused by: transformers 4.17.0 depends on huggingface-hub<1.0 and >=0.1.0 datasets 1.14.0 depends on huggingface-hub<0.1.0 and >=0.0.19

If I use datasets 2.00.0, it is able to run the evaluation.py MNLI ../CoFi-MNLI-s95, but the results seems wrong? What can I do to solve this problem? Thanks a lot!

../CoFi-MNLI-s95 is what is downloaded from https://huggingface.co/princeton-nlp/CoFi-MNLI-s95 Results I obtained: Task: mnli Model path: ../CoFi-MNLI-s95 Model size: 4330279 Sparsity: 0.949 accuracy: 0.091 seconds/example: 0.000531

Too low accuracy compared to the expected result: Task: MNLI Model path: princeton-nlp/CoFi-MNLI-s95 Model size: 4920106 Sparsity: 0.943 mnli/acc: 0.8055 seconds/example: 0.010151

hello did you solve the problem? i meet the same problem

xtchon commented 2 months ago

Hi, thanks for your work. I'm trying to test out the result of your work but found some difficulties on reproducing similar accuracy results. Below is the Environment that I created: channels:

default dependencies:

python=3.9.7

pip

pip:

transformers==4.17.0

scipy==1.7.3

datasets==2.00.0

scikit-learn==1.0.2

torch==1.10.2

black

wandb

matplotlib

I used datasets==2.00.0, cause when I install datasets==1.14.0, it would result the following conflict: The conflict is caused by: transformers 4.17.0 depends on huggingface-hub<1.0 and >=0.1.0 datasets 1.14.0 depends on huggingface-hub<0.1.0 and >=0.0.19 If I use datasets 2.00.0, it is able to run the evaluation.py MNLI ../CoFi-MNLI-s95, but the results seems wrong? What can I do to solve this problem? Thanks a lot! ../CoFi-MNLI-s95 is what is downloaded from https://huggingface.co/princeton-nlp/CoFi-MNLI-s95 Results I obtained: Task: mnli Model path: ../CoFi-MNLI-s95 Model size: 4330279 Sparsity: 0.949 accuracy: 0.091 seconds/example: 0.000531 Too low accuracy compared to the expected result: Task: MNLI Model path: princeton-nlp/CoFi-MNLI-s95 Model size: 4920106 Sparsity: 0.943 mnli/acc: 0.8055 seconds/example: 0.010151

hello did you solve the problem? i meet the same problem

Actually, datasets==1.14.0 is not necessary, use datasets==2.14.6 solves this problem. Afterwards there should be another issues and some code is needed to adjust manually.

SHUSHENGQIGUI commented 1 month ago

Hi, thanks for your work. I'm trying to test out the result of your work but found some difficulties on reproducing similar accuracy results. Below is the Environment that I created: channels:

default dependencies:

python=3.9.7

pip

pip:

transformers==4.17.0

scipy==1.7.3

datasets==2.00.0

scikit-learn==1.0.2

torch==1.10.2

black

wandb

matplotlib

I used datasets==2.00.0, cause when I install datasets==1.14.0, it would result the following conflict: The conflict is caused by: transformers 4.17.0 depends on huggingface-hub<1.0 and >=0.1.0 datasets 1.14.0 depends on huggingface-hub<0.1.0 and >=0.0.19 If I use datasets 2.00.0, it is able to run the evaluation.py MNLI ../CoFi-MNLI-s95, but the results seems wrong? What can I do to solve this problem? Thanks a lot! ../CoFi-MNLI-s95 is what is downloaded from https://huggingface.co/princeton-nlp/CoFi-MNLI-s95 Results I obtained: Task: mnli Model path: ../CoFi-MNLI-s95 Model size: 4330279 Sparsity: 0.949 accuracy: 0.091 seconds/example: 0.000531 Too low accuracy compared to the expected result: Task: MNLI Model path: princeton-nlp/CoFi-MNLI-s95 Model size: 4920106 Sparsity: 0.943 mnli/acc: 0.8055 seconds/example: 0.010151

hello did you solve the problem? i meet the same problem

Actually, datasets==1.14.0 is not necessary, use datasets==2.14.6 solves this problem. Afterwards there should be another issues and some code is needed to adjust manually.

Thank you. Where is the issue occurring, and which code needs to be modified?

SHUSHENGQIGUI commented 1 month ago

I think using datasets==1.14.0 is necessary in this case to get the model performance right. Maybe you can skip the version conflict for now?

hi. when i set transfomers=4.17.0 ,datasets=1.14.0, what version of huggingface-hub should be? there occurs version conflict from huggingface-hub ?

SHUSHENGQIGUI commented 1 month ago

ALL RIGHT. I finally find the key of this problem: I test the princeton-nlp/CoFi-MRPC-s95, the result is the same as the ReadMe table mentioned. by the way, here is my setting: transfomers==4.17.0, datasets==2.1.0, huggingface-hub==0.19.0 So, I guess there are some bugs in evaluation.py to evaluate MNLI accuracy

princeton-nlp / CoFiPruning

Too low accuracy result compared with the expected result #52