lichess-org / lila

♞ lichess.org: the forever free, adless and open source chess server ♞
https://lichess.org
GNU Affero General Public License v3.0
15.05k stars 2.23k forks source link

Cloud engine eval seemingly inflated #14050

Open h3h4ns563b opened 9 months ago

h3h4ns563b commented 9 months ago

Exact URL of where the bug happened

https://lichess.org/paste

Steps to reproduce the bug

  1. Go to https://lichess.org/paste

  2. Paste in the below game and request computer analysis.

  3. e4 e6 2. d4 d6 3. Nf3 Be7 4. Bc4 Nd7 5. Nc3 e5 6. dxe5 dxe5 7. Qd5 1-0

What did you expect to happen?

All the various local engines and other sites I've tried evaluate the final position as ~+5

What happened instead?

The cloud engine evaluates it as +7.8

Operating system

Linux

Browser and version (or alternate access method)

Firefox 119

Additional information

This could be nothing of course, evaluations can differ depending on the engine used etc, but I've noticed a clear pattern now of the cloud engine giving inflated evaluations for positions.

Note that it could be related to the fact that the game is imported. I haven't been playing games on lichess for comparison. I've also only started importing games for analysis recently so I'm not sure if this is new.

atomheartother commented 9 months ago

What depth did you go to? At depth 27 cloud analysis puts this position at +4.5 for me. Can you please check if you can still reproduce?

h3h4ns563b commented 9 months ago

Yes, it still reproduces for me. It may have changed as now I see the +7.8 on white's turn 7 but not black's. However, I suppose it's possible that I misread before, so this time I got a screenshot.

cloud

Note the graph showing the spike to +7.5

local engine

Now, on the same move I've turned on the engine it drops to +4.5

I've been getting this quite a lot. It's not ideal because if I mistake this position to actually be +7.8 and then I explore variations and notice that the eval has dropped to +4.5, I assume that the variation is a mistake.

lsap commented 9 months ago

Tangential (non-cloud), but this PGN has spike from -3.6 on move 27 to -4.6 next move and progresses slowly to -6.2 on move 36. The general gist of the analysis from move 27 to 36 is still correct though and shows a fair advantage for black. EDIT: [Event "Casual Correspondence game"] [Site "https://lichess.org/tt4SomrZ"] [Date "2023.11.23"] [White "Anonymous"] [Black "lichess AI level 5"] [Result "1-0"] [WhiteElo "?"] [BlackElo "?"] [Variant "Standard"] [TimeControl "-"] [ECO "A00"] [Opening "Mieses Opening"] [Termination "Normal"] [Annotator "lichess.org"]

  1. d3 { A00 Mieses Opening } c5 2. e4 Nc6 3. Bf4 g6 4. Nf3 Bg7 5. Nc3 d6 6. Qd2 Nf6 7. Bg5 Nd7?! { (-0.73 → 0.19) Inaccuracy. h6 was best. } (7... h6 8. Be3) 8. Bh6 Bf6?! { (0.12 → 0.78) Inaccuracy. Bxh6 was best. } (8... Bxh6 9. Qxh6 Nf6 10. Be2 Bg4 11. O-O e6 12. Nd2 Bxe2 13. Nxe2 Qe7 14. c3 O-O-O 15. d4) 9. Bg5?! { (0.78 → 0.02) Inaccuracy. Be2 was best. } (9. Be2) 9... Qb6 10. Bxf6 exf6? { (-0.10 → 1.49) Mistake. Nxf6 was best. } (10... Nxf6 11. Be2) 11. Na4? { (1.49 → 0.15) Mistake. O-O-O was best. } (11. O-O-O) 11... Qc7?! { (0.15 → 0.82) Inaccuracy. Qb4 was best. } (11... Qb4) 12. Qh6?! { (0.82 → 0.12) Inaccuracy. Nc3 was best. } (12. Nc3) 12... Nb4?! { (0.12 → 1.16) Inaccuracy. Nd4 was best. } (12... Nd4 13. O-O-O Ne5 14. Qg7 Rf8 15. Nc3 Be6 16. Qxf6 Nexf3 17. gxf3 Qe7 18. Qxe7+ Kxe7 19. Bg2) 13. O-O-O?? { (1.16 → -1.06) Blunder. Kd2 was best. } (13. Kd2) 13... b5?? { (-1.06 → 1.51) Blunder. Nxa2+ was best. } (13... Nxa2+ 14. Kb1 b5 15. Kxa2 bxa4 16. d4 Rb8 17. d5 c4 18. Rb1 c3 19. Nd4 Qc5 20. Qg7) 14. Nc3 Ne5? { (1.57 → 3.46) Mistake. Qa5 was best. } (14... Qa5 15. Qg7) 15. Nxe5?! { (3.46 → 2.47) Inaccuracy. d4 was best. } (15. d4 Nxf3 16. gxf3 Be6 17. Bxb5+ Ke7 18. d5 Bd7 19. Bxd7 Qxd7 20. a3 a5 21. Qh4 g5) 15... dxe5 16. Qg7 Rf8 17. Qxf6?! { (2.96 → 2.17) Inaccuracy. a3 was best. } (17. a3 Be6 18. Qxf6 Nc6 19. d4 exd4 20. Nd5 Qd6 21. Bxb5 Kd7 22. Nb4 Rfc8 23. Nxc6 Rxc6) 17... Qe7 18. Qxe7+ Kxe7 19. a3 Nc6 20. Nd5+? { (2.17 → 0.96) Mistake. Nxb5 was best. } (20. Nxb5 Be6 21. Nc3 Nd4 22. Be2 Rab8 23. Rd2 Rb6 24. Bd1 Rfb8 25. Na4 Rc6 26. b3 f6) 20... Kd6 21. b4?! { (1.03 → 0.31) Inaccuracy. f4 was best. } (21. f4 exf4 22. Nxf4 Rb8 23. g3 a5 24. Bg2 Bg4 25. Rd2 Rfd8 26. Re1 b4 27. a4 h5) 21... Be6 22. bxc5+ Kxc5 23. Nc7 Rad8 24. Nxe6+ fxe6 25. c3?? { (-0.09 → -1.86) Blunder. f3 was best. } (25. f3 a5) 25... Rxf2 26. Rd2 Rdf8 27. Rxf2? { (-1.88 → -3.65) Mistake. Be2 was best. } (27. Be2) 27... Rxf2 28. Kd1 Ra2 29. Ke1 Rxa3 30. Be2 Rxc3 31. Kf2 Nd4 32. Ke3 Rc2 33. Bg4 Rxg2 34. Rc1+ Kb4 35. Rb1+ Kc3 36. Rc1+ Kb2 37. Rc5 Nc2+ 38. Kf3 Ne1+ 39. Ke3 Rxg4 40. Rxb5+ Kc1 41. Rxe5 h5 42. d4 Ng2+ 43. Kd3 Rh4 44. d5 Rh3+ 45. Kd4 exd5 46. Rxd5 Kb2 47. Rd6 Ra3 48. Rxg6 Nf4 49. Rf6 Ne2+ 50. Kc4 Re3 51. Ra6 Rxe4+ 52. Kc5 Re7 53. Kd6 Rg7 54. Ke5 Rf7 55. Ke6 Rb7 56. Kf5 h4 57. h3 Nd4+ 58. Kg5 Nf3+ 59. Kf5 Kb3 60. Kg4 Ne5+ 61. Kxh4 Nd3?! { (-3.57 → -2.59) Inaccuracy. Kb4 was best. } (61... Kb4 62. Kg5 Kb5 63. Ra1 a5 64. Kf5 Nf7 65. Rb1+ Ka6 66. Rg1 a4 67. Ke4 Ka5 68. Kd3) 62. Ra1 Rd7 63. Kg4 Rf7? { (-2.95 → -1.65) Mistake. Ne5+ was best. } (63... Ne5+ 64. Kg5 Nc4 65. Ra6 Kb4 66. h4 Kb5 67. Ra1 Rd5+ 68. Kg4 Ne5+ 69. Kf5 a5 70. Rb1+) 64. h4 Ne5+ 65. Kg5 Kb4 66. Rb1+ Kc4 67. Ra1 Nf3+ 68. Kg6 Rb7 69. h5 Kc5?! { (-1.10 → -0.01) Inaccuracy. Ne5+ was best. } (69... Ne5+ 70. Kf6 Nf7 71. Rc1+ Kd4 72. Ra1 Nh8 73. Rd1+ Kc3 74. Ra1 Rf7+ 75. Kg5 Rd7 76. Kh6) 70. Kf6 Rh7 71. Kg6 Rd7 72. Rc1+? { (0.00 → -1.12) Mistake. Kf6 was best. } (72. Kf6 Kb6 73. h6 Rd6+ 74. Kg7 Nd4 75. Rf1 Kc5 76. Rf6 Rd7+ 77. Kg6 Nc6 78. h7 Ne5+) 72... Kd5 73. Ra1 Rc7? { (-1.18 → -0.01) Mistake. Ne5+ was best. } (73... Ne5+ 74. Kf5 Rf7+ 75. Kg5 Rb7 76. Kf5 Nd7 77. h6 Nf8 78. Kf6 Rh7 79. Kg5 Rf7 80. Ra6) 74. Ra5+ Kd4 75. Kf6 Kc4 76. Ra4+ Kd3 77. Ra3+ Ke4 { Black resigns. } 1-0
dav1312 commented 8 months ago

As far as I know the evaluations of the first few moves of an analyzed game are taken from "cloud" evaluations. This can (as you can see) be very confusing when different engine versions can give different evaluations. In this case the engine was most likely Stockfish 14. image