telefonicaid / fiware-cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context
https://fiware-cygnus.rtfd.io/
GNU Affero General Public License v3.0
64 stars 105 forks source link

Data not persisted after "Communications link failure The last packet sent successfully to the server was 0 milliseconds ago" #1584

Open smartcitydevops opened 5 years ago

smartcitydevops commented 5 years ago

We have two Cygnus in HA to persist data in a MySQL database out of our premises. Both Cygnus instances have writter the following logs in the same second:

time=2019-02-08T06:33:29.697Z | lvl=ERROR | corr=4344c194-2b6b-11e9-9603-fa163e83ea20 | trans=92ce979e-ab26-45c3-b572-a958896394d9 | srv=service| subsrv=/subservice | op=processNewBatches | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSISink[569] : CygnusPersistenceError (SQLException). Connection error (Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.). Stack trace: [com.telefonica.iot.cygnus.backends.mysql.MySQLBackendImpl$MySQLDriver.getConnection(MySQLBackendImpl.java:416), com.telefonica.iot.cygnus.backends.mysql.MySQLBackendImpl.insertContextData(MySQLBackendImpl.java:147), com.telefonica.iot.cygnus.sinks.NGSIMySQLSink.persistAggregation(NGSIMySQLSink.java:556), com.telefonica.iot.cygnus.sinks.NGSIMySQLSink.persistBatch(NGSIMySQLSink.java:200), com.telefonica.iot.cygnus.sinks.NGSISink.processNewBatches(NGSISink.java:558), com.telefonica.iot.cygnus.sinks.NGSISink.process(NGSISink.java:370), org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68), org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147), java.lang.Thread.run(Thread.java:748)]

time=2019-02-08T06:33:24.424Z | lvl=ERROR | corr=451cb5bc-2b6b-11e9-80cb-fa163e4141fd | trans=8847d97a-cc90-4039-8952-a0044a995af6 | srv=service| subsrv=/subservice | op=processNewBatches | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSISink[569] : CygnusPersistenceError (SQLException). Connection error (Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.). Stack trace: [com.telefonica.iot.cygnus.backends.mysql.MySQLBackendImpl$MySQLDriver.getConnection(MySQLBackendImpl.java:416), com.telefonica.iot.cygnus.backends.mysql.MySQLBackendImpl.insertContextData(MySQLBackendImpl.java:147), com.telefonica.iot.cygnus.sinks.NGSIMySQLSink.persistAggregation(NGSIMySQLSink.java:556), com.telefonica.iot.cygnus.sinks.NGSIMySQLSink.persistBatch(NGSIMySQLSink.java:200), com.telefonica.iot.cygnus.sinks.NGSISink.processNewBatches(NGSISink.java:558), com.telefonica.iot.cygnus.sinks.NGSISink.process(NGSISink.java:370), org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68), org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147), java.lang.Thread.run(Thread.java:748)]

This is the second time that we noticed that, after these messages, both Cygnus-es no longer showed any logs for "Persisting data" or "Finishing internal transaction". We have stopped and started both Cygnus-es : from the restarts both Cygnus-es resumed the normal operation.

We think that both Cygnus-es could not resume the data persistence after a temporary, short-timed communication problem with the sink.

If you need any additional information, please ask us for it

smartcitydevops commented 5 years ago

An additional information/correction: we found some messages such as

time=2019-02-08T06:35:37.735Z | lvl=INFO | corr=bbfabe2c-2b6b-11e9-aa56-fa163e4141fd | trans=186f0c61-d712-4a89-86ff-e2eebffa4eef | srv=service| subsrv=/subservice | op=processNewBatches | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSISink[590] : Finishing internal transaction (bbfabe2c-2b6b-11e9-aa56-fa163e4141fd)

and

time=2019-02-08T06:37:22.746Z | lvl=INFO | corr=f68a3f22-2b6b-11e9-b25a-fa163e83ea20 | trans=3cc5cb07-bc74-4675-b0d6-0bf1f3465610 | srv=thinkingcity | subsrv=/dtsmartparking | op=persistAggregation | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSIMySQLSink[545] : [mysql-sink-smartdt] Persisting data at NGSIMySQLSink. [and a description of the database, table, data, etc]_

Please note that , for corr=f68a3f22-2b6b-11e9-b25a-fa163e83ea20, there was no "Finishing internal transaction message".

There were five "Finishing internal transaction" messages from the communication problem log to the aforementioned log in this comment. The closest one to the problem is somehow different to the others:

time=2019-02-08T06:33:41.767Z | lvl=INFO | corr=N/A | trans=N/A | srv=N/A | subsrv=N/A | op=processNewBatches | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.NGSISink[590] : Finishing internal transaction (633b1e80-2b6b-11e9-98df-fa163e4141fd)

There were seven "Persisting data at NGSIMySQLSink" messages until the one in this comment. The first five messages logs matches with the five "Finishing internal transation" messages.

AlvaroVega commented 5 years ago

According to: https://stackoverflow.com/questions/6865538/solving-a-communications-link-failure-with-jdbc-and-mysql there are some possible solutions based on tune MySQL configuration (/etc/mysql/my.conf): https://stackoverflow.com/a/10772407/5485829

Here are the solutions:

    changing "bind-address" attribute

Uncomment "bind-address" attribute or change it to one of the following IPs:

bind-address="127.0.0.1"

or

bind-address="0.0.0.0"

    commenting out "skip-networking"

If there is a "skip-networking" line in your MySQL config file, make it comment by adding "#" sign at the beginning of that line.

    change "wait_timeout" and "interactive_timeout"

Add these lines to the MySQL config file:

wait_timeout = number

interactive_timeout = number

connect_timeout = number

    Make sure Java isn't translating 'localhost' to [:::1] instead of [127.0.0.1]

Since MySQL recognizes 127.0.0.1 (IPv4) but not :::1 (IPv6)

This could be avoided by using one of two approaches:

Option #1: In the connection string use 127.0.0.1 instead of localhost to avoid localhost being translated to :::1

Option #2: Run java with the option -Djava.net.preferIPv4Stack=true to force java to use IPv4 instead of IPv6. On Linux, this could also be achieved by running (or placing it inside /etc/profile:

export _JAVA_OPTIONS="-Djava.net.preferIPv4Stack=true"

and more. Maybe apply these solutions to MySQL configuration will fix the issue

AlvaroVega commented 5 years ago

Another suspicious comment: https://github.com/telefonicaid/fiware-cygnus/blob/master/cygnus-common/src/main/java/com/telefonica/iot/cygnus/backends/mysql/MySQLBackendImpl.java#L399

Do you know @smartcitydevops how many connections to MySQL were opened at that moment?

pmo-sdr commented 5 years ago

We think this issue could be closed. @smartcitydevops is that right?

smartcitydevops commented 5 years ago

Pleas let's wait to the upgrade of the last Cygnus versión within the production environment where this issue was detected; after that, we would need also a reasonable time to check that this issue does not happen again .

Thank you for raising this question, we will keep you informed