Closed donghaozhang95 closed 3 years ago
I just run BERT-base-cased on my 2080Ti card, the F1 and Ign_f1 score on the dev set are 61.297 and 59.383, respectively. Did you change the hyper-parameters or the code?
No, I did not change any code. I retrained ATLOP based on the bert-base-cased model today. The max F1 and F1_ign score on the dev dataset is 59.15 and 57.47, respectively. The log file is as follows.
{'dev_F1': 40.57923317426362, 'dev_F1_ign': 39.82843679530871}
{'dev_F1': 39.317141851460754, 'dev_F1_ign': 38.731451321946714}
{'dev_F1': 53.72393247269116, 'dev_F1_ign': 51.82852703252537}
{'dev_F1': 50.58173356602676, 'dev_F1_ign': 49.36600222927602}
{'dev_F1': 54.298684454326484, 'dev_F1_ign': 52.64229255774235}
{'dev_F1': 54.41405573827246, 'dev_F1_ign': 52.561194513805845}
{'dev_F1': 56.93991810813069, 'dev_F1_ign': 54.88397008227478}
{'dev_F1': 54.64799323244666, 'dev_F1_ign': 53.01448866481678}
{'dev_F1': 56.89193710270993, 'dev_F1_ign': 54.925301639722946}
{'dev_F1': 57.38558280931162, 'dev_F1_ign': 55.43350340485963}
{'dev_F1': 56.18050709467318, 'dev_F1_ign': 54.589301003405225}
{'dev_F1': 57.89518826657518, 'dev_F1_ign': 56.024853693007415}
{'dev_F1': 57.11343898289909, 'dev_F1_ign': 55.445560509989875}
{'dev_F1': 57.63551002987624, 'dev_F1_ign': 55.70515125902471}
{'dev_F1': 57.50654189027365, 'dev_F1_ign': 55.80387633820646}
{'dev_F1': 57.94447976266157, 'dev_F1_ign': 55.972197209941086}
{'dev_F1': 57.557797584038376, 'dev_F1_ign': 55.86492287420353}
{'dev_F1': 57.17678220760835, 'dev_F1_ign': 55.54798422695539}
{'dev_F1': 58.03141174370874, 'dev_F1_ign': 56.29502460234226}
{'dev_F1': 58.73300397848992, 'dev_F1_ign': 56.95267370386069}
{'dev_F1': 58.220144646322105, 'dev_F1_ign': 56.45715804929671}
{'dev_F1': 58.603736479842674, 'dev_F1_ign': 56.95127679442925}
{'dev_F1': 58.500087001914046, 'dev_F1_ign': 56.69137973731373}
{'dev_F1': 58.70439973468936, 'dev_F1_ign': 57.01062416855641}
{'dev_F1': 58.969897337741436, 'dev_F1_ign': 57.169858528392126}
{'dev_F1': 58.66923987894206, 'dev_F1_ign': 56.86381213652116}
{'dev_F1': 58.953802542410415, 'dev_F1_ign': 57.21422910717455}
{'dev_F1': 58.94294401717044, 'dev_F1_ign': 57.26143976338728}
{'dev_F1': 59.153420253616716, 'dev_F1_ign': 57.47291552693614}
{'dev_F1': 59.078663793103445, 'dev_F1_ign': 57.428448957874856}
That's strange. My results are as follows:
{'dev_F1': 32.35294117647059, 'dev_F1_ign': 31.670065829378125}
{'dev_F1': 47.216821162107166, 'dev_F1_ign': 45.565177871922806}
{'dev_F1': 51.88936501753019, 'dev_F1_ign': 49.48743853637863}
{'dev_F1': 55.320524662848705, 'dev_F1_ign': 53.77233227315189}
{'dev_F1': 57.01049084922276, 'dev_F1_ign': 54.530963266015995}
{'dev_F1': 54.35577834981711, 'dev_F1_ign': 52.662905283790664}
{'dev_F1': 57.092442761232576, 'dev_F1_ign': 55.24124034331722}
{'dev_F1': 57.56786616161615, 'dev_F1_ign': 55.2671735761698}
{'dev_F1': 58.43957982519446, 'dev_F1_ign': 56.22291653053052}
{'dev_F1': 58.83133542708011, 'dev_F1_ign': 57.007184385478105}
{'dev_F1': 57.08765150184436, 'dev_F1_ign': 55.28044315673585}
{'dev_F1': 58.8809946714032, 'dev_F1_ign': 57.0958041868302}
{'dev_F1': 59.75548902195609, 'dev_F1_ign': 57.60126287714314}
{'dev_F1': 59.23188172906959, 'dev_F1_ign': 56.92129113624196}
{'dev_F1': 59.255494854529275, 'dev_F1_ign': 57.20487028708159}
{'dev_F1': 59.765804058840686, 'dev_F1_ign': 57.67510676877976}
{'dev_F1': 59.2059991177768, 'dev_F1_ign': 57.32650721214624}
{'dev_F1': 60.23192654222529, 'dev_F1_ign': 58.26144056626074}
{'dev_F1': 60.13117621337998, 'dev_F1_ign': 58.33816824514157}
{'dev_F1': 60.03118503118503, 'dev_F1_ign': 58.0990858563687}
{'dev_F1': 60.03288050532145, 'dev_F1_ign': 58.10443556634631}
{'dev_F1': 60.24964838255978, 'dev_F1_ign': 58.42950456826262}
{'dev_F1': 60.57595956955518, 'dev_F1_ign': 58.70558790059614}
{'dev_F1': 61.29674515235457, 'dev_F1_ign': 59.38345474249023}
{'dev_F1': 60.96293140178952, 'dev_F1_ign': 58.955539504003774}
{'dev_F1': 61.110871719739734, 'dev_F1_ign': 59.19516358180187}
{'dev_F1': 61.13008516709178, 'dev_F1_ign': 59.22814916058826}
{'dev_F1': 61.08768608553481, 'dev_F1_ign': 59.15149724251205}
{'dev_F1': 61.03140752229548, 'dev_F1_ign': 59.102518872245014}
{'dev_F1': 61.12309152074528, 'dev_F1_ign': 59.19453328918382}
I'm not sure what causes the problem. Could you change the random seed and retrain the model? Also, could you check which patch of apex you are using?
Thank you very much for your kind reply. I retrained ATLOP based on the bert-base-cased model with different random seeds (80, 8, 60, 0). However, the mean max F1 and F1_ign score on the dev dataset are 58.90 and 57.16. Besides, the max F1 and F1_ign score on the dev dataset of these models are 59.13 and 57.38.
Additionally, apex in my computer is installed from https://github.com/NVIDIA/apex on November 4th. I installed this package by using this command pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Your apex version is the same as mine. I still don't know what causes model performance degradation. The last things you can try are: (1) changing the "O1" in train.py line 76 to "O0". (2) training the model with one GPU card (not multi-GPU).
Thanks for your reply! I changed "O1" to "O0", but the max F1 and F1_ign score on the dev dataset are 59.12 and 57.40. Moreover, I always use one GPU to train these models.
However, I find a warning before the model training. Is this normal?
/home/shared/opt/anaconda3/envs/torch1.4py3.7/lib/python3.7/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
You can ignore the warning. My server configuration is:
Ubuntu 18.04.4 LTS (4.15.0-122-generic)
CUDA 10.1.105
RTX 2080Ti
I cloned the repository from Github and ran training following the instructions. The results were about the same as the reported number. In my experiments, the worst F1 score I saw was 60.8. I have no idea why your results can be that bad. For now, you can try to train the model for more epochs or change other hyper-parameters such as learning rate and batch size. I will release the trained model for reproducibility.
Okay, thank you very much for your reply! I will try your suggestions.
Hello, I also retrained ATLOP based on the bert-base-cased model on the DocRED dataset. The F1 score is 59.1. Have you solved this problem now? @donghaozhang95
Hello everyone, I have also trained ATLOP based on bert-base-cased model on DocRED without changing the code. The max F1 and F1_ign score on the dev dataset is 60.5 and 58.6, respectively. It looks pretty close to the results reported on the paper. The log file is as follows. Just for your information.
{'dev_F1': 36.65239114917916, 'dev_F1_ign': 36.121458560258546} {'dev_F1': 43.07136602451839, 'dev_F1_ign': 42.06632007842279} {'dev_F1': 54.229757910962014, 'dev_F1_ign': 52.51460105446404} {'dev_F1': 53.89318540713447, 'dev_F1_ign': 51.53765539594611} {'dev_F1': 57.24284967215093, 'dev_F1_ign': 54.98606300328284} {'dev_F1': 56.80841335162322, 'dev_F1_ign': 55.176713651487695} {'dev_F1': 57.52753977968176, 'dev_F1_ign': 55.86522252630643} {'dev_F1': 57.4955448341809, 'dev_F1_ign': 55.5335878449675} {'dev_F1': 58.19615494045441, 'dev_F1_ign': 56.04413020051552} {'dev_F1': 58.24517212426532, 'dev_F1_ign': 56.25502680745882} {'dev_F1': 58.567708333333336, 'dev_F1_ign': 56.65573805709466} {'dev_F1': 58.60706242298049, 'dev_F1_ign': 56.611860818252616} {'dev_F1': 59.24065469462933, 'dev_F1_ign': 57.17471653369394} {'dev_F1': 59.03996786503314, 'dev_F1_ign': 56.72884898393532} {'dev_F1': 59.37617751727026, 'dev_F1_ign': 57.32280536267519} {'dev_F1': 59.36046267968652, 'dev_F1_ign': 57.33412470016974} {'dev_F1': 59.497487437185924, 'dev_F1_ign': 57.42492154966868} {'dev_F1': 60.08782832049052, 'dev_F1_ign': 57.96277956728739} {'dev_F1': 59.980517555376736, 'dev_F1_ign': 58.01558148108492} {'dev_F1': 59.9073552967661, 'dev_F1_ign': 57.99505736173566} {'dev_F1': 60.38136492971504, 'dev_F1_ign': 58.3279091408506} {'dev_F1': 59.836862347322594, 'dev_F1_ign': 57.75315550651342} {'dev_F1': 60.27444549926641, 'dev_F1_ign': 58.312281881548756} {'dev_F1': 60.30333146091691, 'dev_F1_ign': 58.36353028446361} {'dev_F1': 60.27954476610845, 'dev_F1_ign': 58.31284193241417} {'dev_F1': 60.43480125209039, 'dev_F1_ign': 58.42965394045191} {'dev_F1': 60.36529280364418, 'dev_F1_ign': 58.519804587361556} {'dev_F1': 60.46187363834423, 'dev_F1_ign': 58.56983574266176} {'dev_F1': 60.487168334058296, 'dev_F1_ign': 58.587996989630106} {'dev_F1': 60.495753163459874, 'dev_F1_ign': 58.57504048883248}
results for different bert:
bert-base-cased:
{'dev_F1': 43.36642831421369, 'dev_F1_ign': 42.311393982538156, '_timestamp': 1646907640, '_runtime': 180} {'dev_F1': 39.90423853310844, 'dev_F1_ign': 38.965148297569655, '_timestamp': 1646907762, '_runtime': 302} {'dev_F1': 54.254814942479214, 'dev_F1_ign': 52.47619539726859, '_timestamp': 1646907882, '_runtime': 422} {'dev_F1': 54.5346171480972, 'dev_F1_ign': 52.80796868576285, '_timestamp': 1646908003, '_runtime': 543} {'dev_F1': 57.47482571649882, 'dev_F1_ign': 55.567495894485276, '_timestamp': 1646908128, '_runtime': 668} {'dev_F1': 56.38849599245639, 'dev_F1_ign': 54.44618531261989, '_timestamp': 1646908254, '_runtime': 794} {'dev_F1': 58.143113618876626, 'dev_F1_ign': 56.19162584345815, '_timestamp': 1646908375, '_runtime': 915} {'dev_F1': 57.28268678160921, 'dev_F1_ign': 55.48641562056987, '_timestamp': 1646908499, '_runtime': 1039} {'dev_F1': 58.27534572359728, 'dev_F1_ign': 56.40001339045444, '_timestamp': 1646908616, '_runtime': 1156} {'dev_F1': 57.98842705066479, 'dev_F1_ign': 56.10895825707796, '_timestamp': 1646908740, '_runtime': 1280} {'dev_F1': 58.285299689628765, 'dev_F1_ign': 55.888440951343895, '_timestamp': 1646908860, '_runtime': 1400} {'dev_F1': 59.22443169563905, 'dev_F1_ign': 57.38874072912339, '_timestamp': 1646908984, '_runtime': 1524} {'dev_F1': 59.03719368080607, 'dev_F1_ign': 56.90997189200957, '_timestamp': 1646909107, '_runtime': 1647} {'dev_F1': 59.517169252222416, 'dev_F1_ign': 57.6411341084312, '_timestamp': 1646909225, '_runtime': 1765} {'dev_F1': 58.91813507931626, 'dev_F1_ign': 57.0626886144678, '_timestamp': 1646909350, '_runtime': 1890} {'dev_F1': 59.05381944444446, 'dev_F1_ign': 57.07794817861319, '_timestamp': 1646909468, '_runtime': 2008} {'dev_F1': 59.2828476705636, 'dev_F1_ign': 57.386899737717535, '_timestamp': 1646909585, '_runtime': 2125} {'dev_F1': 59.52523268721714, 'dev_F1_ign': 57.601044760283884, '_timestamp': 1646909706, '_runtime': 2246} {'dev_F1': 59.70086013304521, 'dev_F1_ign': 57.64409739361579, '_timestamp': 1646909829, '_runtime': 2369} {'dev_F1': 60.02119093028184, 'dev_F1_ign': 58.045208696823, '_timestamp': 1646909954, '_runtime': 2494} {'dev_F1': 59.72007556242487, 'dev_F1_ign': 57.785743579487615, '_timestamp': 1646910097, '_runtime': 2637} {'dev_F1': 59.885694641399176, 'dev_F1_ign': 57.93642141141729, '_timestamp': 1646910230, '_runtime': 2770} {'dev_F1': 59.752844950213365, 'dev_F1_ign': 57.93683197807869, '_timestamp': 1646910366, '_runtime': 2906} {'dev_F1': 60.023966978829165, 'dev_F1_ign': 58.23136307612826, '_timestamp': 1646910500, '_runtime': 3040} {'dev_F1': 59.87725695988615, 'dev_F1_ign': 58.08563644558884, '_timestamp': 1646910626, '_runtime': 3166} {'dev_F1': 60.410225144798154, 'dev_F1_ign': 58.49664850887586, '_timestamp': 1646910744, '_runtime': 3284} {'dev_F1': 60.21013760056271, 'dev_F1_ign': 58.37608369769607, '_timestamp': 1646910869, '_runtime': 3409} {'dev_F1': 60.449661243688766, 'dev_F1_ign': 58.5101389992577, '_timestamp': 1646910988, '_runtime': 3528} {'dev_F1': 60.519514652415786, 'dev_F1_ign': 58.6733103214247, '_timestamp': 1646911113, '_runtime': 3653} {'dev_F1': 60.34790949121507, 'dev_F1_ign': 58.454030608278764, '_timestamp': 1646911237, '_runtime': 3777}
bert-based-uncased:
{'dev_F1': 43.36642831421369, 'dev_F1_ign': 42.311393982538156, '_timestamp': 1646907640, '_runtime': 180} {'dev_F1': 39.90423853310844, 'dev_F1_ign': 38.965148297569655, '_timestamp': 1646907762, '_runtime': 302} {'dev_F1': 54.254814942479214, 'dev_F1_ign': 52.47619539726859, '_timestamp': 1646907882, '_runtime': 422} {'dev_F1': 54.5346171480972, 'dev_F1_ign': 52.80796868576285, '_timestamp': 1646908003, '_runtime': 543} {'dev_F1': 57.47482571649882, 'dev_F1_ign': 55.567495894485276, '_timestamp': 1646908128, '_runtime': 668} {'dev_F1': 56.38849599245639, 'dev_F1_ign': 54.44618531261989, '_timestamp': 1646908254, '_runtime': 794} {'dev_F1': 58.143113618876626, 'dev_F1_ign': 56.19162584345815, '_timestamp': 1646908375, '_runtime': 915} {'dev_F1': 57.28268678160921, 'dev_F1_ign': 55.48641562056987, '_timestamp': 1646908499, '_runtime': 1039} {'dev_F1': 58.27534572359728, 'dev_F1_ign': 56.40001339045444, '_timestamp': 1646908616, '_runtime': 1156} {'dev_F1': 57.98842705066479, 'dev_F1_ign': 56.10895825707796, '_timestamp': 1646908740, '_runtime': 1280} {'dev_F1': 58.285299689628765, 'dev_F1_ign': 55.888440951343895, '_timestamp': 1646908860, '_runtime': 1400} {'dev_F1': 59.22443169563905, 'dev_F1_ign': 57.38874072912339, '_timestamp': 1646908984, '_runtime': 1524} {'dev_F1': 59.03719368080607, 'dev_F1_ign': 56.90997189200957, '_timestamp': 1646909107, '_runtime': 1647} {'dev_F1': 59.517169252222416, 'dev_F1_ign': 57.6411341084312, '_timestamp': 1646909225, '_runtime': 1765} {'dev_F1': 58.91813507931626, 'dev_F1_ign': 57.0626886144678, '_timestamp': 1646909350, '_runtime': 1890} {'dev_F1': 59.05381944444446, 'dev_F1_ign': 57.07794817861319, '_timestamp': 1646909468, '_runtime': 2008} {'dev_F1': 59.2828476705636, 'dev_F1_ign': 57.386899737717535, '_timestamp': 1646909585, '_runtime': 2125} {'dev_F1': 59.52523268721714, 'dev_F1_ign': 57.601044760283884, '_timestamp': 1646909706, '_runtime': 2246} {'dev_F1': 59.70086013304521, 'dev_F1_ign': 57.64409739361579, '_timestamp': 1646909829, '_runtime': 2369} {'dev_F1': 60.02119093028184, 'dev_F1_ign': 58.045208696823, '_timestamp': 1646909954, '_runtime': 2494} {'dev_F1': 59.72007556242487, 'dev_F1_ign': 57.785743579487615, '_timestamp': 1646910097, '_runtime': 2637} {'dev_F1': 59.885694641399176, 'dev_F1_ign': 57.93642141141729, '_timestamp': 1646910230, '_runtime': 2770} {'dev_F1': 59.752844950213365, 'dev_F1_ign': 57.93683197807869, '_timestamp': 1646910366, '_runtime': 2906} {'dev_F1': 60.023966978829165, 'dev_F1_ign': 58.23136307612826, '_timestamp': 1646910500, '_runtime': 3040} {'dev_F1': 59.87725695988615, 'dev_F1_ign': 58.08563644558884, '_timestamp': 1646910626, '_runtime': 3166} {'dev_F1': 60.410225144798154, 'dev_F1_ign': 58.49664850887586, '_timestamp': 1646910744, '_runtime': 3284} {'dev_F1': 60.21013760056271, 'dev_F1_ign': 58.37608369769607, '_timestamp': 1646910869, '_runtime': 3409} {'dev_F1': 60.449661243688766, 'dev_F1_ign': 58.5101389992577, '_timestamp': 1646910988, '_runtime': 3528} {'dev_F1': 60.519514652415786, 'dev_F1_ign': 58.6733103214247, '_timestamp': 1646911113, '_runtime': 3653} {'dev_F1': 60.34790949121507, 'dev_F1_ign': 58.454030608278764, '_timestamp': 1646911237, '_runtime': 3777} {'dev_F1': 61.48782649056922, 'dev_F1_ign': 59.51406116915573, '_timestamp': 1647601090, '_runtime': 3610} {'dev_F1': 61.51273475688191, 'dev_F1_ign': 59.60466564728384, '_timestamp': 1647601213, '_runtime': 3733} {'dev_F1': 61.39841012052312, 'dev_F1_ign': 59.46397291088697, '_timestamp': 1647601335, '_runtime': 3855}
Do the authors exactly use bert-base-uncased
instead of bert-base-cased
?
@taolusi @donghaozhang95 @wzhouad
Hello, I retrained ATLOP based on the bert-base-cased model on the DocRED dataset. However, the max F1 and F1_ign score on the dev dataset is 58.81 and 57.09, respectively. However, these scores are much lower than the reported score in your paper (61.09, 59.22). Is the default model config correct? My environment is as follows: Best regards