Open smartcitydevops opened 5 years ago
An additional information/correction: we found some messages such as
time=2019-02-08T06:35:37.735Z | lvl=INFO | corr=bbfabe2c-2b6b-11e9-aa56-fa163e4141fd | trans=186f0c61-d712-4a89-86ff-e2eebffa4eef | srv=service| subsrv=/subservice | op=processNewBatches | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSISink[590] : Finishing internal transaction (bbfabe2c-2b6b-11e9-aa56-fa163e4141fd)
and
time=2019-02-08T06:37:22.746Z | lvl=INFO | corr=f68a3f22-2b6b-11e9-b25a-fa163e83ea20 | trans=3cc5cb07-bc74-4675-b0d6-0bf1f3465610 | srv=thinkingcity | subsrv=/dtsmartparking | op=persistAggregation | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSIMySQLSink[545] : [mysql-sink-smartdt] Persisting data at NGSIMySQLSink. [and a description of the database, table, data, etc]_
Please note that , for corr=f68a3f22-2b6b-11e9-b25a-fa163e83ea20, there was no "Finishing internal transaction message".
There were five "Finishing internal transaction" messages from the communication problem log to the aforementioned log in this comment. The closest one to the problem is somehow different to the others:
time=2019-02-08T06:33:41.767Z | lvl=INFO | corr=N/A | trans=N/A | srv=N/A | subsrv=N/A | op=processNewBatches | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSISink[590] : Finishing internal transaction (633b1e80-2b6b-11e9-98df-fa163e4141fd)
There were seven "Persisting data at NGSIMySQLSink" messages until the one in this comment. The first five messages logs matches with the five "Finishing internal transation" messages.
According to: https://stackoverflow.com/questions/6865538/solving-a-communications-link-failure-with-jdbc-and-mysql there are some possible solutions based on tune MySQL configuration (/etc/mysql/my.conf): https://stackoverflow.com/a/10772407/5485829
Here are the solutions:
changing "bind-address" attribute
Uncomment "bind-address" attribute or change it to one of the following IPs:
bind-address="127.0.0.1"
or
bind-address="0.0.0.0"
commenting out "skip-networking"
If there is a "skip-networking" line in your MySQL config file, make it comment by adding "#" sign at the beginning of that line.
change "wait_timeout" and "interactive_timeout"
Add these lines to the MySQL config file:
wait_timeout = number
interactive_timeout = number
connect_timeout = number
Make sure Java isn't translating 'localhost' to [:::1] instead of [127.0.0.1]
Since MySQL recognizes 127.0.0.1 (IPv4) but not :::1 (IPv6)
This could be avoided by using one of two approaches:
Option #1: In the connection string use 127.0.0.1 instead of localhost to avoid localhost being translated to :::1
Option #2: Run java with the option -Djava.net.preferIPv4Stack=true to force java to use IPv4 instead of IPv6. On Linux, this could also be achieved by running (or placing it inside /etc/profile:
export _JAVA_OPTIONS="-Djava.net.preferIPv4Stack=true"
and more. Maybe apply these solutions to MySQL configuration will fix the issue
Another suspicious comment: https://github.com/telefonicaid/fiware-cygnus/blob/master/cygnus-common/src/main/java/com/telefonica/iot/cygnus/backends/mysql/MySQLBackendImpl.java#L399
Do you know @smartcitydevops how many connections to MySQL were opened at that moment?
We think this issue could be closed. @smartcitydevops is that right?
Pleas let's wait to the upgrade of the last Cygnus versión within the production environment where this issue was detected; after that, we would need also a reasonable time to check that this issue does not happen again .
Thank you for raising this question, we will keep you informed
We have two Cygnus in HA to persist data in a MySQL database out of our premises. Both Cygnus instances have writter the following logs in the same second:
time=2019-02-08T06:33:29.697Z | lvl=ERROR | corr=4344c194-2b6b-11e9-9603-fa163e83ea20 | trans=92ce979e-ab26-45c3-b572-a958896394d9 | srv=service| subsrv=/subservice | op=processNewBatches | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSISink[569] : CygnusPersistenceError (SQLException). Connection error (Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.). Stack trace: [com.telefonica.iot.cygnus.backends.mysql.MySQLBackendImpl$MySQLDriver.getConnection(MySQLBackendImpl.java:416), com.telefonica.iot.cygnus.backends.mysql.MySQLBackendImpl.insertContextData(MySQLBackendImpl.java:147), com.telefonica.iot.cygnus.sinks.NGSIMySQLSink.persistAggregation(NGSIMySQLSink.java:556), com.telefonica.iot.cygnus.sinks.NGSIMySQLSink.persistBatch(NGSIMySQLSink.java:200), com.telefonica.iot.cygnus.sinks.NGSISink.processNewBatches(NGSISink.java:558), com.telefonica.iot.cygnus.sinks.NGSISink.process(NGSISink.java:370), org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68), org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147), java.lang.Thread.run(Thread.java:748)]
time=2019-02-08T06:33:24.424Z | lvl=ERROR | corr=451cb5bc-2b6b-11e9-80cb-fa163e4141fd | trans=8847d97a-cc90-4039-8952-a0044a995af6 | srv=service| subsrv=/subservice | op=processNewBatches | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSISink[569] : CygnusPersistenceError (SQLException). Connection error (Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.). Stack trace: [com.telefonica.iot.cygnus.backends.mysql.MySQLBackendImpl$MySQLDriver.getConnection(MySQLBackendImpl.java:416), com.telefonica.iot.cygnus.backends.mysql.MySQLBackendImpl.insertContextData(MySQLBackendImpl.java:147), com.telefonica.iot.cygnus.sinks.NGSIMySQLSink.persistAggregation(NGSIMySQLSink.java:556), com.telefonica.iot.cygnus.sinks.NGSIMySQLSink.persistBatch(NGSIMySQLSink.java:200), com.telefonica.iot.cygnus.sinks.NGSISink.processNewBatches(NGSISink.java:558), com.telefonica.iot.cygnus.sinks.NGSISink.process(NGSISink.java:370), org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68), org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147), java.lang.Thread.run(Thread.java:748)]
This is the second time that we noticed that, after these messages, both Cygnus-es no longer showed any logs for "Persisting data" or "Finishing internal transaction". We have stopped and started both Cygnus-es : from the restarts both Cygnus-es resumed the normal operation.
We think that both Cygnus-es could not resume the data persistence after a temporary, short-timed communication problem with the sink.
If you need any additional information, please ask us for it