Open ShawnXuan opened 3 years ago
Wow thanks, I think you are absolutely right.
This raise the question of course for how to update old models. One option is to just not update them, which is probably okay - they are trained with this weird functional form and do well with it - thankfully, both the scale and the base score have independent "sb2w" and "sb2scale2w" before them which do get used separately, so the scaling path and the basic score path are still meaningfully separately configurable by the net even now.
The other option is to do a little bit of careful migration - I think it should work to initialize a new model based on the old one with replacing sbscale3w weight values with sb3w weight values, while simultaneously making the above change. This would keep the a model behaving exactly the same as before, while introducing this new degree of freedom.
I think I can write that migration, as well as the method to make it opt-in so that old models can stay as-is. I'll work on it later this week.
Okay, pushed to master this change: 260667c0
I think newly-trained neural nets should now be fixed.
Neural nets created before this fix will need manual migration. Summary of problem for those reading this thread later and trying to manually migrate: there were supposed to be two separate weight matrices "sb3w" and "sbscale3w" in the score belief head, but accidentally "sb3w" is used for both.
To properly use "sbscale3w", instead of just "sb3w" after taking a backup of everything, manually edit the model.config.json
to have "use_fixed_sbscaling": true,
.
Doing this alone however will of course create a problem, since "sbscale3w" will have never been trained to have useful weights, since it was omitted while "sb3w" was trained to perform the function of both. So also, run migrate_sbscale.py script on your tensorflow checkpoint for the model, to create a new checkpoint. This script will make a new checkpoint that has copied sb3w -> sbscale3w. Swap in this new checkpoint for the old one in addition to editing the model.config.json, and you should be good to go - sbscale3w will now be initialized to start with what sb3w used to be, ensuring the model resumes exactly from where it was, but going forward now the two matrices can be separately tuned.
Hi there,
In the following line: https://github.com/lightvector/KataGo/blob/5a017039adcffc6b9b5057fbe6b724f4fcf5a178/python/model.py#L1176
Should
sb3w
be replaced tosbscale3w
?