Closed shanth96 closed 4 months ago
Hi @shanth96!
I believe this behavior change is coming from: https://github.com/vitessio/vitess/pull/14733
Specifically here, as the value is an empty string after reading the empty config file: https://github.com/vitessio/vitess/pull/14733/files#diff-ee5ffc675b719e69de7147068ab4ecf03f6d5db91a000d212a30f8ce61544691R424-R431
That is still in place on main: https://github.com/vitessio/vitess/blob/2e009e3e1d7a84b071926c18ad951f305ebf4cf9/go/vt/vttablet/tabletserver/tabletenv/config.go#L413-L420
on main:
❯ vttablet --version
vttablet version Version: 21.0.0-SNAPSHOT (Git revision 8a59865817029578efcac98abf1a502150d572c9 branch 'main') built on Mon Jul 15 19:04:54 EDT 2024 by matt@pslord.local using go1.22.5 darwin/arm64
❯ cat /tmp/tablet-config.yml
consolidator: enable
❯ cat bugtest.go
package main
import (
"fmt"
"os"
"vitess.io/vitess/go/vt/vttablet/tabletserver/tabletenv"
"vitess.io/vitess/go/yaml2"
)
var tabletConfig = "/tmp/tablet-config.yml"
func main() {
config := tabletenv.NewDefaultConfig()
gotBytes, _ := yaml2.Marshal(config)
fmt.Printf("Config before unmarshaling:\n%s", gotBytes)
bytes, err := os.ReadFile(tabletConfig)
if err != nil {
panic(err)
}
if err := yaml2.Unmarshal(bytes, config); err != nil {
panic(err)
}
gotBytes, _ = yaml2.Marshal(config)
fmt.Printf("Loaded config file %s successfully:\n%s", tabletConfig, gotBytes)
}
❯ go run bugtest.go | grep -i reload
schemaChangeReloadTimeout: 30s
schemaReloadIntervalSeconds: 30m0s
On v18:
git checkout release-18.0
make build
❯ go run bugtest.go | grep -i reload
schemaChangeReloadTimeout: 30s
schemaReloadIntervalSeconds: 30m0s
schemaChangeReloadTimeout: 30s
schemaReloadIntervalSeconds: 30m0s
Hi @mattlord! Thank you for the quick fix!
Overview of the Issue
We're trying to validate v19 to upgrade our clusters and we noticed that Vitess is unable to boot up and gets stuck in a loop with the following errors:
While digging into this, we noticed that the errors were happening because
schemaChangeReloadTimeout
was being set to 0s (even though default is 30s and we don't explicitly configure it). This was causing the DEADLINE_EXCEEDED errors when trying to open the schema engine hereUpon further investigation, we noticed that the
initConfig
section of vttablet had some weird behaviour:Reproduction Steps
The code below is a simple reproduction of the
initConfig
code here.Running this in v18 vs v19 produces two different results:
where tablet-config.yaml is a simple yaml file with any config:
on v19, we get the following output:
vs. on v18, we get:
Particularly, note how schemaChangeReloadTimeout is not present as a config variable in v19 after unmarshaling.
Binary Version
Operating System and Environment details
Log Fragments
No response