Open xingchengxu opened 6 months ago
You are loading the model the wrong way. You're not supposed to guess what config to use, the checkpoint file itself contains the correct config. See for example the proper way to load the model: https://github.com/lightvector/KataGo/blob/master/python/load_model.py#L36-L55
This is because for any network architecture, there are many small options that can be configured, see https://github.com/lightvector/KataGo/blob/master/python/modelconfigs.py#L1401-L1504. Usually a network will be using some combination of these options, so you don't want to guess, you just want to trust the config inside the checkpoint itself.
Great! Thank you very much for pointing out the correct approach to loading the model. Your guidance clarifies the importance of using the configuration embedded within the checkpoint file itself, rather than attempting to deduce or manually adjust the configuration. Thanks!
version of KataGo: 1.14.0 version of torch: 1.12.1+cu113
I encountered a compatibility issue when loading the kata1-b18c384nbt-s9402410496-d4158172623/model.ckpt checkpoint with the original b18c384nbt model configuration. The initial setup did not align with the model checkpoint, leading to discrepancies in the network's ResBlock, conv1x1 was not initialized in 'bottlenest2'. In addition, intermediate_value_head, and intermediate_policy_head were missing.
To address this, I adjusted the model configuration as follows:
and some tweaks in
model_pytorch.py
, we are able to load the .ckpt and match all parameters to the modelOutput:
However, when we try to play with the model to see whether the model is good or not with the following code:
The model made an unexpected move ("H7") and displayed an overconfident policy prediction (output['policy1'][-1] was 1.0), indicating a potential misunderstanding of the game state.
I am seeking advice on resolving this issue and identifying compatible checkpoints for model_pytorch to ensure accurate model performance and behavior.