Closed fengzhao closed 3 years ago
@fengzhao Please show use the log of PD.
➜ log date Thu Dec 13 17:53:20 HKT 2018
➜ log date Thu Dec 13 17:54:00 HKT 2018 ➜ log ➜ log ➜ log tail -300f pd.log 2018/12/13 17:52:47.818 util.go:56: [info] Welcome to Placement Driver (PD). 2018/12/13 17:52:47.819 util.go:57: [info] Release Version: v2.1.0 2018/12/13 17:52:47.819 util.go:58: [info] Git Commit Hash: 757e91d4bf3df047a8953f239c7c2eb2aea6e7e1 2018/12/13 17:52:47.819 util.go:59: [info] Git Branch: release-2.1 2018/12/13 17:52:47.819 util.go:60: [info] UTC Build Time: 2018-11-30 11:36:10 2018/12/13 17:52:47.819 metricutil.go:79: [info] disable Prometheus push client 2018/12/13 17:52:47.819 server.go:104: [info] PD config - { "client-urls": "http://192.168.8.36:2379", "peer-urls": "http://192.168.8.36:2380", "advertise-client-urls": "http://192.168.8.36:2379", "advertise-peer-urls": "http://192.168.8.36:2380", "name": "pd1", "data-dir": "/home/tidb/deploy/data.pd", "initial-cluster": "pd1=http://192.168.8.36:2380", "initial-cluster-state": "new", "join": "", "lease": 1, "log": { "level": "info", "format": "text", "disable-timestamp": false, "file": { "filename": "/home/tidb/deploy/log/pd.log", "log-rotate": true, "max-size": 300, "max-days": 0, "max-backups": 0 } }, "log-file": "", "log-level": "", "tso-save-interval": "3s", "metric": { "job": "pd1", "address": "", "interval": "0s" }, "schedule": { "max-snapshot-count": 3, "max-pending-peer-count": 16, "max-merge-region-size": 0, "max-merge-region-keys": 0, "split-merge-interval": "1h0m0s", "patrol-region-interval": "100ms", "max-store-down-time": "30m0s", "leader-schedule-limit": 4, "region-schedule-limit": 4, "replica-schedule-limit": 8, "merge-schedule-limit": 8, "tolerant-size-ratio": 5, "low-space-ratio": 0.8, "high-space-ratio": 0.6, "disable-raft-learner": "false", "disable-remove-down-replica": "false", "disable-replace-offline-replica": "false", "disable-make-up-replica": "false", "disable-remove-extra-replica": "false", "disable-location-replacement": "false", "disable-namespace-relocation": "false", "schedulers-v2": [ { "type": "balance-region", "args": null, "disable": false }, { "type": "balance-leader", "args": null, "disable": false }, { "type": "hot-region", "args": null, "disable": false }, { "type": "label", "args": null, "disable": false } ] }, "replication": { "max-replicas": 3, "location-labels": "" }, "namespace": {}, "cluster-version": "0.0.0", "quota-backend-bytes": "0 B", "auto-compaction-mode": "periodic", "auto-compaction-retention-v2": "1h", "TickInterval": "500ms", "ElectionInterval": "3s", "PreVote": true, "security": { "cacert-path": "", "cert-path": "", "key-path": "" }, "label-property": null, "WarningMsgs": null, "namespace-classifier": "table" } 2018/12/13 17:52:47.821 server.go:136: [info] start embed etcd 2018/12/13 17:52:47.821 log.go:88: [info] embed: [pprof is enabled under /debug/pprof] 2018/12/13 17:52:47.822 log.go:88: [info] etcdserver: [name = pd1] 2018/12/13 17:52:47.822 log.go:88: [info] etcdserver: [data dir = /home/tidb/deploy/data.pd] 2018/12/13 17:52:47.822 log.go:88: [info] etcdserver: [member dir = /home/tidb/deploy/data.pd/member] 2018/12/13 17:52:47.822 log.go:88: [info] etcdserver: [heartbeat = 500ms] 2018/12/13 17:52:47.822 log.go:88: [info] etcdserver: [election = 3000ms] 2018/12/13 17:52:47.822 log.go:88: [info] etcdserver: [snapshot count = 100000] 2018/12/13 17:52:47.822 log.go:88: [info] etcdserver: [advertise client URLs = http://192.168.8.36:2379] 2018/12/13 17:52:47.823 systime_mon.go:24: [info] start system time monitor 2018/12/13 17:52:47.866 log.go:88: [info] etcdserver: [restarting member 61784f96623c1953 in cluster 7e6abe01b25b1b1c at commit index 16455] 2018/12/13 17:52:47.868 raft.go:656: [info] 61784f96623c1953 became follower at term 4 2018/12/13 17:52:47.868 raft.go:364: [info] newRaft 61784f96623c1953 [peers: [], term: 4, commit: 16455, applied: 0, lastindex: 16455, lastterm: 4] 2018/12/13 17:52:47.869 log.go:88: [info] mvcc: [restore compact to 15100] 2018/12/13 17:52:47.872 log.go:84: [warning] auth: [simple token is not cryptographically signed] 2018/12/13 17:52:47.872 log.go:88: [info] embed: [61784f96623c1953 starting with cors [""]] 2018/12/13 17:52:47.872 log.go:88: [info] embed: [61784f96623c1953 starting with host whitelist [""]] 2018/12/13 17:52:47.872 log.go:88: [info] etcdserver: [starting server... [version: 3.3.0+git, cluster version: to_be_decided]] 2018/12/13 17:52:47.875 log.go:88: [info] embed: [listening for peers on 192.168.8.36:2380] 2018/12/13 17:52:47.875 log.go:88: [info] etcdserver/membership: [added member 61784f96623c1953 [http://192.168.8.36:2380] to cluster 7e6abe01b25b1b1c] 2018/12/13 17:52:47.875 log.go:86: [info] etcdserver/membership: [set the initial cluster version to 3.3] 2018/12/13 17:52:47.875 log.go:88: [info] etcdserver/api: [enabled capabilities for version 3.3] 2018/12/13 17:52:52.868 raft.go:857: [info] 61784f96623c1953 is starting a new election at term 4 2018/12/13 17:52:52.868 raft.go:684: [info] 61784f96623c1953 became pre-candidate at term 4 2018/12/13 17:52:52.868 raft.go:755: [info] 61784f96623c1953 received MsgPreVoteResp from 61784f96623c1953 at term 4 2018/12/13 17:52:52.868 raft.go:669: [info] 61784f96623c1953 became candidate at term 5 2018/12/13 17:52:52.868 raft.go:755: [info] 61784f96623c1953 received MsgVoteResp from 61784f96623c1953 at term 5 2018/12/13 17:52:52.869 raft.go:712: [info] 61784f96623c1953 became leader at term 5 2018/12/13 17:52:52.869 node.go:306: [info] raft.node: 61784f96623c1953 elected leader 61784f96623c1953 at term 5 2018/12/13 17:52:52.869 server.go:166: [info] create etcd v3 client with endpoints [http://192.168.8.36:2379] 2018/12/13 17:52:52.869 log.go:88: [info] etcdserver: [published {Name:pd1 ClientURLs:[http://192.168.8.36:2379]} to cluster 7e6abe01b25b1b1c] 2018/12/13 17:52:52.869 log.go:88: [info] embed: [ready to serve client requests] 2018/12/13 17:52:52.870 log.go:86: [info] embed: [serving insecure client requests on 192.168.8.36:2379, this is strongly discouraged!] 2018/12/13 17:52:52.871 server.go:205: [info] init cluster id 6634201091088044800 2018/12/13 17:52:52.871 namespace_classifier.go:438: [info] load 0 namespacesInfo cost 283.89µs 2018/12/13 17:52:52.872 leader.go:95: [warning] leader is still name:"pd1" member_id:7023451126196083027 peer_urls:"http://192.168.8.36:2380" client_urls:"http://192.168.8.36:2379" , delete and campaign again 2018/12/13 17:52:52.875 tso.go:104: [info] sync and save timestamp: last 2018-12-13 17:50:52.573357585 +0800 HKT save 2018-12-13 17:52:55.874996105 +0800 HKT m=+8.067861500 next 2018-12-13 17:52:52.874996105 +0800 HKT m=+5.067861500 2018/12/13 17:52:52.878 leader.go:263: [info] cluster version is 0.0.0 2018/12/13 17:52:52.878 leader.go:264: [info] PD cluster leader pd1 is ready to serve
@shenli @lysu
@lysu could you help me slove the problem first ? hurry hurry hurry ! I made a test on my server using docker to deploy tidb . it works ! but when it doesn't work using ansible 。 thx
@fengzhao Hi, our ansible expert are look at this question please wait a mount.
Hi,May be a pd-server conflict, please confirm that the pd-server has been started before starting the cluster with ansible.
should i comment this code ? does it matters ? @lysu
Hi,May be a pd-server conflict, please confirm that the pd-server has been started before starting the cluster with ansible.
what does a pd-server conflict mean ? the ansbile start.yml defines the order of the server start . it seems the pd-server start OK . but DB is wating pd-server's http responce . but i use curl to make a request , it works . that's why i can't understand it 。
should i comment this code ? does it matters ? @lysu
it works . when i comment the wait for pd up services . it start OK and i can use mysql client to connect it and access the moniter using chrome 。I'm unfamiliar with the ansible。 but according to the code , it seems to make a http request and until "200 OK" in the response to be searched ?
but i don't konw how ansible make this reqesut . show u consider log the request clearly like the "curl -v " cammond ?
Login pd node ,execute command "ps -ef |grep pd-server" ,and then output result . What is the time that executed "ansible-playbook start.yml" ?
不用质疑时间的问题了,能不能关注一下重点,我发的日志只不过是因为我第二天一大早清空了pd头天的日志,然后在中控机又启动了一下,再贴上pd的日志的,在中控机执行启动命令之后,就去pd server里面看进程和日志了,都正常,然后中控机启动命令执行到等待连接pd的时候挂在那里,一直到连接超时退出,然后我在中控机手动curl向pd发http请求,连接正常。然后我把检查连接的语句注释再重启一下。就启动成功了。
Could you provide the output of the following commands, Thanks.
$ time curl -v "172.16.30.193:2379/pd/api/v1/members"
$ env | grep -i http
Could you provide the output of the following commands, Thanks.
$ time curl -v "172.16.30.193:2379/pd/api/v1/members" $ env | grep -i http
我curl访问pd的url是秒开,连接正常,我不知道你让我查看环境变量是想做什么
我对ansible不是很熟悉,我想问问,start.yml 文件中的这段代码,具体是用什么去发起http请求的。 `
- name: wait for PD up
wait_for: |
host={{ ansible_host }} port={{ pd_client_port }} state=present
send='GET /pd/api/v1/members HTTP/1.0\r\n\r\n' search_regex='200 OK'
when: not enable_tls|default(false)
- name: wait for PD up
uri:
url: "https://{{ ansible_host }}:{{ pd_client_port }}/pd/api/v1/members"
validate_certs: no
client_cert: "{{ pd_cert_dir }}/pd-server-{{ ansible_host }}.pem"
client_key: "{{ pd_cert_dir }}/pd-server-{{ ansible_host }}-key.pem"
register: result
until: result.status is defined and result.status == 200
retries: 10
delay: 5
when: enable_tls|default(false)
`
Ansible use wait_for module currently. I'd like to check if you have set any env about HTTP, like HTTP_PROXY.
We're testing uri module to check http interface and if we can reproduce this error on TiDB 2.1.
我就算有代理,也是在外层的网络设置的透明代理,而且也只是针对部分被墙的网站才走代理。我的节点都是vmware esxi中同一网段内的虚拟机,不会走代理的。真的很希望你们能重现这个问题。 thx
没必要造轮子啊,我觉得可以考虑在wait_for 中引requests这个库来发http请求啊。这个库是很优秀的库。我觉得你们可以考虑一下这个。
我把以前的老虚拟机删了,重新开了虚拟机,环境是这样的。 192.168.8.35 db 中控机 192.168.8.36 pd 192.168.8.37 kv
inventory.ini
## TiDB Cluster Part
[tidb_servers]
192.168.8.35
[tikv_servers]
192.168.8.37
[pd_servers]
192.168.8.36
[spark_master]
[spark_slaves]
[lightning_server]
[importer_server]
## Monitoring Part
# prometheus and pushgateway servers
[monitoring_servers]
192.168.8.35
[grafana_servers]
192.168.8.35
# node_exporter and blackbox_exporter servers
[monitored_servers]
192.168.8.35
[alertmanager_servers]
[kafka_exporter_servers]
在start.yml中注释了检查[wait for pd-server up]的任务,然后启动,这次都没到pd-server就已经在[wait for node_exporter up]连接超时。手动 curl 访问,postman,浏览器访问 。都是正常。下面是详细截图
Could you run following commands and check if there're any errors?
cd /home/tidb/tidb-ansible/library
mv wait_for.py wait_for.py.bak
cd /home/tidb/tidb-ansible/
mv start.yml start.yml.bak
wget https://raw.githubusercontent.com/LinuxGit/tidb-ansible/Louis/wait_for/start.yml
ansible-playbook start.yml -v
I have a commit, use uri module to check http interface, but not merged yet. And I could not reproduce it on TiDB 2.1.
Looks like this issue went stale and should be closed
closed as stale. Thank you for reporting this issue @fengzhao ! Now we're using TiUP to manage TiDB cluster. You can file an issue there if found any question.
TiUP也用过了,也是有些小问题,问题一堆。不太稳定。
@fengzhao as mentioned above, you can file an issue there if found any concrete question.
BTW, for user question, we host a Chinese forum https://asktug.com/ you may be interested in.
@fengzhao as mentioned above, you can file an issue there if found any concrete question.
BTW, for user question, we host a Chinese forum https://asktug.com/ you may be interested in.
anyway,我依然觉得pingcap是个牛逼的公司,你们在做的东西是技术含量很高的东西。持续关注,我这边中英文均可。
Bug Report
英文不太好,我就直接上中文了,不过你用中文或英文回复我都可以。谢谢! Please answer these questions before submitting your issue. Thanks!
What did you expect to see? 启动成功
What did you see instead? 连不上PD
What version of TiDB are you using (
tidb-server -V
or runselect tidb_version();
on TiDB)? Release Version: v2.1.0