kevinduh / san_mrc

Stochastic Answer Networks (SAN) for Machine Reading Comprehension
BSD 3-Clause "New" or "Revised" License
148 stars 47 forks source link

Upload logs for Squad 2.0 #5

Closed LearningPytorch closed 5 years ago

LearningPytorch commented 5 years ago

Hi Thanks for new code..they're great. Can you also upload the logs (san.log) for squad 2.0? I want to make sure that I'm getting similar scores like you. thank you again.

LearningPytorch commented 5 years ago

Specially I want to check the vocab size since you changed the prepro.py: Raw vocab size vs vocab in glove: 106415/90949 OOV rate:1.2000=262509/21875454 final vocab size: 90953

namisan commented 5 years ago

Sure. I'll release it soon.

LearningPytorch commented 5 years ago

Thanks a lot! I'll wait for it. It'll be very useful to compare with your performance on Squad 2.0. E.g. What is your performance (EM, F1) on Squad dev 2.0?

namisan commented 5 years ago

We got 69.x/72.x on dev in terms EM/F1. We're writing a tech report about our model/experiments and will publish soon.

LearningPytorch commented 5 years ago

Wow! That's much higher than what I got when I ran this package: best EM: 62.x F1: 66.x Did you see anything unusual with my vocab size which I uploaded above? I'm not sure why my performance is ~6 points lower than yours. I was able to get almost same numbers (as your reported) on 1.1 by running your system.

LearningPytorch commented 5 years ago

@namisan is there any way you can upload the updated code soon? Your code is good and I get to learn a lot about Squad 2. I'm working on a course project with some of your code. I saw that you're going on a vacation on the other open issue. Hope you upload before that. Thanks

hackiey commented 5 years ago

I had a similar EM: 62.x and F1: 66.x results, maybe something is wrong.

LearningPytorch commented 5 years ago

Hi @hackiey. Thanks for confirming that you got same/ similar results as me. The package gets similar results reported in the readme for Squad 1.1 but not for 2.0. @namisan @kevinduh maybe we're doing something wrong?

namisan commented 5 years ago

The current config is for v1.1, not for 2.0. As the attached tech report, using a lower dropout rate, e.g., 0.1, and larger hidden size (300) could lead a better result. Hope this helps. I'm currently on vacation and will checkin the logs or models once I'm back.

LearningPytorch commented 5 years ago

Hope you will upload all the code that gives you the performance gains ..that would be very useful @namisan. Have a good vacation.

LearningPytorch commented 5 years ago

Are you back @namisan ?

hackerwei commented 5 years ago

@namisan could U update your hyper-params for squad-2.0, I have tried dropout & hidden-size, the highest F1 reached 69.3.

LearningPytorch commented 5 years ago

@hackerwei could you please elaborate which params did you change? there are so many drop-out params and hidden size variables. It would be great if you upload your config.py file. Thank you.

@namisan we are also waiting for you too since your params will let us get the numbers reported in the tech report.

namisan commented 5 years ago

I released the worksheets of official submissions. I close this.