opennetworkinglab / flowvisor

FlowVisor - A network hypervisor
Other
164 stars 67 forks source link

FlowVisor semi-stuck #253

Open jbsbbn opened 11 years ago

jbsbbn commented 11 years ago

On 2013-06-16, our production FlowVisor at BBN (1.0.8) got into a wonky state, whereby calls to listSlices and listFlowSpace were hanging, but some other calls (like listDevices) were working fine.

The FV stderr log:

[12:11:24] jbs at tulum:/home/jbs
+$ cat /var/log/flowvisor/flowvisor-stderr.log
Starting FlowVisor
--- Setting logging level to NOTE
Starting FlowVisor
--- Setting logging level to NOTE
Starting FlowVisor
--- Setting logging level to NOTE
Errors occurred. Please make sure that the database exists and/or no other FlowVisor is running.
Starting FlowVisor
--- Setting logging level to NOTE
Errors occurred. Please make sure that the database exists and/or no other FlowVisor is running.
Starting FlowVisor
--- Setting logging level to NOTE
Starting FlowVisor
--- Setting logging level to NOTE
Starting FlowVisor
--- Setting logging level to NOTE
[: 32: -server: unexpected operator
Starting FlowVisor
--- Setting logging level to NOTE
2013-05-27 09:43:45.201:INFO::Logging to StdErrLog::DEBUG=false via org.eclipse.jetty.util.log.StdErrLog
2013-05-27 09:43:45.248:INFO::jetty-7.0.2.v20100331
2013-05-27 09:43:45.693:INFO::Started SslSelectChannelConnector at 0.0.0.0:8081
[: 32: -server: unexpected operator
Starting FlowVisor
--- Setting logging level to NOTE
2013-06-03 08:09:58.838:INFO::Logging to StdErrLog::DEBUG=false via org.eclipse.jetty.util.log.StdErrLog
2013-06-03 08:09:58.867:INFO::jetty-7.0.2.v20100331
2013-06-03 08:09:59.119:INFO::Started SslSelectChannelConnector at 0.0.0.0:8081
java.lang.reflect.InvocationTargetException
    at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.flowvisor.config.ConfigurationEvent.invoke(ConfigurationEvent.java:46)
    at org.flowvisor.classifier.FVClassifier.processChange(FVClassifier.java:850)
    at org.flowvisor.config.FVConfigurationController.fireChange(FVConfigurationController.java:80)
    at org.flowvisor.config.FlowSpaceImpl.notify(FlowSpaceImpl.java:649)
    at org.flowvisor.config.FlowSpaceImpl.notifyChange(FlowSpaceImpl.java:338)
    at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.flowvisor.config.FVConfigProxy.invoke(FVConfigProxy.java:21)
    at sun.proxy.$Proxy4.notifyChange(Unknown Source)
    at org.flowvisor.api.FVUserAPIImpl.deleteSlice(FVUserAPIImpl.java:450)
    at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115)
    at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106)
    at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46)
    at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86)
    at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200)
    at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208)
    at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68)
Caused by: java.util.ConcurrentModificationException
    at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:390)
    at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:409)
    at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:408)
    at org.flowvisor.classifier.CookieTranslator.getCookieList(CookieTranslator.java:53)
    at org.flowvisor.classifier.FVClassifier.cleanUpFlowMods(FVClassifier.java:699)
    at org.flowvisor.classifier.FVClassifier.connectToControllers(FVClassifier.java:681)
    at org.flowvisor.classifier.FVClassifier.flowMapChanged(FVClassifier.java:857)
    ... 24 more
java.lang.reflect.InvocationTargetException
    at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.flowvisor.config.ConfigurationEvent.invoke(ConfigurationEvent.java:46)
    at org.flowvisor.classifier.FVClassifier.processChange(FVClassifier.java:850)
    at org.flowvisor.config.FVConfigurationController.fireChange(FVConfigurationController.java:80)
    at org.flowvisor.config.FlowSpaceImpl.notify(FlowSpaceImpl.java:649)
    at org.flowvisor.config.FlowSpaceImpl.notifyChange(FlowSpaceImpl.java:338)
    at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.flowvisor.config.FVConfigProxy.invoke(FVConfigProxy.java:21)
    at sun.proxy.$Proxy4.notifyChange(Unknown Source)
    at org.flowvisor.api.FVUserAPIImpl.deleteSlice(FVUserAPIImpl.java:450)
    at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115)
    at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106)
    at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46)
    at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86)
    at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200)
    at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208)
    at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68)
Caused by: java.util.ConcurrentModificationException
    at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:390)
    at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:409)
    at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:408)
    at org.flowvisor.classifier.CookieTranslator.getCookieList(CookieTranslator.java:53)
    at org.flowvisor.classifier.FVClassifier.cleanUpFlowMods(FVClassifier.java:699)
    at org.flowvisor.classifier.FVClassifier.connectToControllers(FVClassifier.java:681)
    at org.flowvisor.classifier.FVClassifier.flowMapChanged(FVClassifier.java:857)
    ... 24 more

It's not entirely clear when that stacktrace is from -- the line before it has a timestamp, but that line doesn't, so it might have been later. The file hasn't been modified since June 3rd, though, so maybe this isn't related. (In which case I don't know what that stacktrace is about.)

Our syslog included

Jun 15 20:00:38 tulum puppetd[11396]: (//Node[tulum]/geni::lab::foam::server::production/flowvisor::server/Exec[flowvisor_check_config]/unless) Check "/usr/local/bin/flowvisor_check_config" exceeded timeout

which is a Puppet-spawned process to make sure that our FV configuration looks like it's supposed to; that calls 'fvctl-json --passwd-file=/etc/flowvisor.passwd save-config /home/flowvisor/config-check.json', so that was presumably hanging too (and uses the JSON fvctl, so it wasn't just the XMLRPC interface).

Before and after that error, there were a bunch of messages like

Jun 15 20:00:17 tulum -1>Jun 15 20:00:17 flowvisor: DEBUG execute: -> 
Jun 15 20:00:17 tulum -1>Jun 15 20:00:17 flowvisor: DEBUG execute: Request performed successfully 
Jun 15 20:00:17 tulum -1>Jun 15 20:00:17 flowvisor: DEBUG execute: <- 
Jun 15 20:00:17 tulum -1>Jun 15 20:00:17 flowvisor: DEBUG execute: -> 
Jun 15 20:00:18 tulum -1>Jun 15 20:00:18 flowvisor: DEBUG execute: Request performed successfully 
Jun 15 20:00:18 tulum -1>Jun 15 20:00:18 flowvisor: DEBUG execute: <- 
Jun 15 20:00:18 tulum -1>Jun 15 20:00:18 flowvisor: DEBUG execute: -> 

but we seem to get those pretty frequently.

We restarted FV (with 'sudo service flowvisor stop' and then 'sudo service flowvisor start'), and that seemed to fix the problem.

jbsbbn commented 11 years ago

This happened again today, with the I2 Flowvisor this time. Their flowvisor-stderr.log ended with:

--- Setting logging level to NOTE
Starting FlowVisor
--- Setting logging level to NOTE
Starting FlowVisor
--- Setting logging level to NOTE
2013-05-30 16:24:09.632:INFO::Logging to StdErrLog::DEBUG=false via org.eclipse.jetty.util.log.StdErrLog
2013-05-30 16:24:09.696:INFO::jetty-7.0.2.v20100331
2013-05-30 16:24:10.414:INFO::Started SslSelectChannelConnector@0.0.0.0:8081
2013-06-10 18:56:58.967:WARN::javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
2013-06-12 01:50:09.075:WARN::javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?

So, nothing at all recent. (All timestamps here are UTC, BTW.)

They didn't see anything obvious in syslog, but grep-ed for "starving", and found:

Jul  5 14:00:13 2013-07-05 14: 00:13,949 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:38 : STARVING: handling event took 14ms: org.flowvisor.events.FVIOEvent@63baaa03
Jul  5 14:00:13 2013-07-05 14: 00:13,949 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:38 : STARVING: handling event took 14ms: org.flowvisor.events.FVIOEvent@63baaa03
Jul  5 14:02:15 2013-07-05 14: 02:15,307 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:6e : STARVING: handling event took 16ms: org.flowvisor.events.FVIOEvent@43248b61
Jul  5 14:02:15 2013-07-05 14: 02:15,307 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:6e : STARVING: handling event took 16ms: org.flowvisor.events.FVIOEvent@43248b61
Jul  5 14:04:14 2013-07-05 14: 04:14,282 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:83:00:12:e2:22:63:38 : STARVING: handling event took 15ms: org.flowvisor.events.FVIOEvent@3c9ecdb9
Jul  5 14:04:14 2013-07-05 14: 04:14,282 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:83:00:12:e2:22:63:38 : STARVING: handling event took 15ms: org.flowvisor.events.FVIOEvent@3c9ecdb9
Jul  5 14:09:16 2013-07-05 14: 09:16,180 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:1d : STARVING: handling event took 14ms: org.flowvisor.events.FVIOEvent@432227e3
Jul  5 14:09:16 2013-07-05 14: 09:16,180 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:1d : STARVING: handling event took 14ms: org.flowvisor.events.FVIOEvent@432227e3
Jul  5 14:09:29 2013-07-05 14: 09:29,343 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) classifier-dpid=0e:84:00:12:e2:22:63:6e : STARVING: handling event took 44ms: org.flowvisor.events.FVIOEvent@71c7064e
Jul  5 14:09:29 2013-07-05 14: 09:29,343 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) classifier-dpid=0e:84:00:12:e2:22:63:6e : STARVING: handling event took 44ms: org.flowvisor.events.FVIOEvent@71c7064e
Jul  5 14:10:18 2013-07-05 14: 10:18,413 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) classifier-dpid=0e:84:00:12:e2:22:63:38 : STARVING: handling event took 40ms: org.flowvisor.events.FVIOEvent@55a43a0e
Jul  5 14:10:18 2013-07-05 14: 10:18,413 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) classifier-dpid=0e:84:00:12:e2:22:63:38 : STARVING: handling event took 40ms: org.flowvisor.events.FVIOEvent@55a43a0e
Jul  5 14:13:23 2013-07-05 14: 13:23,188 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) OFSwitchAcceptor : STARVING: handling event took 11ms: org.flowvisor.events.FVIOEvent@69ac8d63
Jul  5 14:13:23 2013-07-05 14: 13:23,188 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) OFSwitchAcceptor : STARVING: handling event took 11ms: org.flowvisor.events.FVIOEvent@69ac8d63

We think those may be before it hung, though, which we think was at around 14:17. There are no FV syslog messages at all from 14:15 - 14:27, which seems very odd. Then, at around 14;27, an I2 admin approved a FOAM sliver, which shows up in the logs. There's then nothing again between 14:29 and 14:35, at which point we see

Jul  5 14:35:56 2013-07-05 14: 35:56,479 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Failed to invoke method listDevices in class org.flowvisor.api.FVUserAPIXMLRPCImpl: Java heap space(exception = class java.lang.OutOfMemoryError) 
Jul  5 14:35:56 2013-07-05 14: 35:56,479 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Failed to invoke method listDevices in class org.flowvisor.api.FVUserAPIXMLRPCImpl: Java heap space(exception = class java.lang.OutOfMemoryError) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.Arrays.copyOf(Arrays.java:2760) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.Arrays.copyOf(Arrays.java:2760) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.Arrays.copyOf(Arrays.java:2734) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.Arrays.copyOf(Arrays.java:2734) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.ArrayList.toArray(ArrayList.java:275) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.ArrayList.toArray(ArrayList.java:275) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.ArrayList.<init>(ArrayList.java:131) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.ArrayList.<init>(ArrayList.java:131) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.flowvisor.FlowVisor.getHandlersCopy(FlowVisor.java:369) 
Jul  5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.flowvisor.FlowVisor.getHandlersCopy(FlowVisor.java:369) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.flowvisor.api.FVUserAPIImpl.listDevices(FVUserAPIImpl.java:347) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.flowvisor.api.FVUserAPIImpl.listDevices(FVUserAPIImpl.java:347) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.lang.reflect.Method.invoke(Method.java:597) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.lang.reflect.Method.invoke(Method.java:597) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46) 
Jul  5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46) 
Jul  5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86) 
Jul  5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86) 
Jul  5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200) 
Jul  5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200) 
Jul  5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208) 
Jul  5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208) 
Jul  5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) 
Jul  5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) 
Jul  5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Remote host closed connection during handshake(exception = class java.io.EOFException) 
Jul  5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Remote host closed connection during handshake(exception = class java.io.EOFException) 
Jul  5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:333) 
Jul  5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:333) 
Jul  5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798) 
Jul  5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:753) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:753) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.io.BufferedInputStream.read(BufferedInputStream.java:237) 
Jul  5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.io.BufferedInputStream.read(BufferedInputStream.java:237) 
Jul  5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.readLine(Connection.java:241) 
Jul  5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.readLine(Connection.java:241) 
Jul  5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.getRequestConfig(Connection.java:150) 
Jul  5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.getRequestConfig(Connection.java:150) 
Jul  5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.run(Connection.java:204) 
Jul  5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.run(Connection.java:204) 
Jul  5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) 
Jul  5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) 
Jul  5 14:37:42 2013-07-05 14: 37:42,386 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Failed to invoke method listDevices in class org.flowvisor.api.FVUserAPIXMLRPCImpl: Java heap space(exception = class java.lang.OutOfMemoryError) 
Jul  5 14:37:42 2013-07-05 14: 37:42,386 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Failed to invoke method listDevices in class org.flowvisor.api.FVUserAPIXMLRPCImpl: Java heap space(exception = class java.lang.OutOfMemoryError) 
Jul  5 14:37:42 2013-07-05 14: 37:42,386 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.Arrays.copyOf(Arrays.java:2760) 
Jul  5 14:37:42 2013-07-05 14: 37:42,386 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.Arrays.copyOf(Arrays.java:2760) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.Arrays.copyOf(Arrays.java:2734) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.Arrays.copyOf(Arrays.java:2734) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.ArrayList.toArray(ArrayList.java:275) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.ArrayList.toArray(ArrayList.java:275) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.ArrayList.<init>(ArrayList.java:131) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.util.ArrayList.<init>(ArrayList.java:131) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.flowvisor.FlowVisor.getHandlersCopy(FlowVisor.java:369) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.flowvisor.FlowVisor.getHandlersCopy(FlowVisor.java:369) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.flowvisor.api.FVUserAPIImpl.listDevices(FVUserAPIImpl.java:347) 
Jul  5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.flowvisor.api.FVUserAPIImpl.listDevices(FVUserAPIImpl.java:347) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.lang.reflect.Method.invoke(Method.java:597) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at java.lang.reflect.Method.invoke(Method.java:597) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106) 
Jul  5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106) 
Jul  5 14:37:42 2013-07-05 14: 37:42,389 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46) 
Jul  5 14:37:42 2013-07-05 14: 37:42,389 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46) 
Jul  5 14:37:42 2013-07-05 14: 37:42,389 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86) 
Jul  5 14:37:42 2013-07-05 14: 37:42,389 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86) 
Jul  5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200) 
Jul  5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200) 
Jul  5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208) 
Jul  5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208) 
Jul  5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) 
Jul  5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none :      at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) 

Traffic still wasn't flowing at this point, and the listDevices call is from an admin seeing if it was completely stuck. He restarted FV shortly after that, and it seems to be working fine now.

Note that at BBN, we did not get the OutOfMemoryError exceptions at any point... Although our initial hang was off-hours, so no one noticed or tried to test anything until several hours later.

mssumanth commented 11 years ago

Josh,

When FV hung for the first time in I2 with list-devices command was the response for list-devices > 64k too? And second time after the restart of FV, did they execute the same commands like list-devices and getting stats-reply msg from the switch and it didn't hung up?

In my testing over here with my mininet one strange thing which I noticed was, I did not get the error (or FV didn't hung up) even when the stats_reply message was between 150k Bytes to 250 k Bytes. I am not sure of how many individual stats rules were there, but the total file length of the stats-reply which was received from the switches were between 150k-250k bytes.

Anyway I am trying to figure out how to send the messages in segments instead of one large chunk in order to avoid these Buffer full exceptions.

Cheers Sumanth

On Fri, Jul 5, 2013 at 8:40 AM, Josh Smift notifications@github.com wrote:

This happened again today, with the I2 Flowvisor this time. Their flowvisor-stderr.log ended with:

--- Setting logging level to NOTE Starting FlowVisor --- Setting logging level to NOTE Starting FlowVisor --- Setting logging level to NOTE 2013-05-30 16:24:09.632:INFO::Logging to StdErrLog::DEBUG=false via org.eclipse.jetty.util.log.StdErrLog 2013-05-30 16:24:09.696:INFO::jetty-7.0.2.v20100331 2013-05-30 16:24:10.414:INFO::Started SslSelectChannelConnector@0.0.0.0:8081 2013-06-10 18:56:58.967:WARN::javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? 2013-06-12 01:50:09.075:WARN::javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?

So, nothing at all recent. (All timestamps here are UTC, BTW.)

They didn't see anything obvious in syslog, but grep-ed for "starving", and found:

Jul 5 14:00:13 2013-07-05 14: 00:13,949 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:38 : STARVING: handling event took 14ms: org.flowvisor.events.FVIOEvent@63baaa03 Jul 5 14:00:13 2013-07-05 14: 00:13,949 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:38 : STARVING: handling event took 14ms: org.flowvisor.events.FVIOEvent@63baaa03 Jul 5 14:02:15 2013-07-05 14: 02:15,307 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:6e : STARVING: handling event took 16ms: org.flowvisor.events.FVIOEvent@43248b61 Jul 5 14:02:15 2013-07-05 14: 02:15,307 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:6e : STARVING: handling event took 16ms: org.flowvisor.events.FVIOEvent@43248b61 Jul 5 14:04:14 2013-07-05 14: 04:14,282 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:83:00:12:e2:22:63:38 : STARVING: handling event took 15ms: org.flowvisor.events.FVIOEvent@3c9ecdb9 Jul 5 14:04:14 2013-07-05 14: 04:14,282 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:83:00:12:e2:22:63:38 : STARVING: handling event took 15ms: org.flowvisor.events.FVIOEvent@3c9ecdb9 Jul 5 14:09:16 2013-07-05 14: 09:16,180 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:1d : STARVING: handling event took 14ms: org.flowvisor.events.FVIOEvent@432227e3 Jul 5 14:09:16 2013-07-05 14: 09:16,180 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) slicer_44b81d5a-aa04-4c53-984b-2ee6ce76c2dc_dpid=0e:84:00:12:e2:22:63:1d : STARVING: handling event took 14ms: org.flowvisor.events.FVIOEvent@432227e3 Jul 5 14:09:29 2013-07-05 14: 09:29,343 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) classifier-dpid=0e:84:00:12:e2:22:63:6e : STARVING: handling event took 44ms: org.flowvisor.events.FVIOEvent@71c7064e Jul 5 14:09:29 2013-07-05 14: 09:29,343 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) classifier-dpid=0e:84:00:12:e2:22:63:6e : STARVING: handling event took 44ms: org.flowvisor.events.FVIOEvent@71c7064e Jul 5 14:10:18 2013-07-05 14: 10:18,413 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) classifier-dpid=0e:84:00:12:e2:22:63:38 : STARVING: handling event took 40ms: org.flowvisor.events.FVIOEvent@55a43a0e Jul 5 14:10:18 2013-07-05 14: 10:18,413 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) classifier-dpid=0e:84:00:12:e2:22:63:38 : STARVING: handling event took 40ms: org.flowvisor.events.FVIOEvent@55a43a0e Jul 5 14:13:23 2013-07-05 14: 13:23,188 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) OFSwitchAcceptor : STARVING: handling event took 11ms: org.flowvisor.events.FVIOEvent@69ac8d63 Jul 5 14:13:23 2013-07-05 14: 13:23,188 ERROR org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) OFSwitchAcceptor : STARVING: handling event took 11ms: org.flowvisor.events.FVIOEvent@69ac8d63

We think those may be before it hung, though, which we think was at around 14:17. There are no FV syslog messages at all from 14:15 - 14:27, which seems very odd. Then, at around 14;27, an I2 admin approved a FOAM sliver, which shows up in the logs. There's then nothing again between 14:29 and 14:35, at which point we see

Jul 5 14:35:56 2013-07-05 14: 35:56,479 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Failed to invoke method listDevices in class org.flowvisor.api.FVUserAPIXMLRPCImpl: Java heap space(exception = class java.lang.OutOfMemoryError) Jul 5 14:35:56 2013-07-05 14: 35:56,479 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Failed to invoke method listDevices in class org.flowvisor.api.FVUserAPIXMLRPCImpl: Java heap space(exception = class java.lang.OutOfMemoryError) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.Arrays.copyOf(Arrays.java:2760) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.Arrays.copyOf(Arrays.java:2760) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.Arrays.copyOf(Arrays.java:2734) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.Arrays.copyOf(Arrays.java:2734) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.ArrayList.toArray(ArrayList.java:275) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.ArrayList.toArray(ArrayList.java:275) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.ArrayList.(ArrayList.java:131) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.ArrayList.(ArrayList.java:131) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.flowvisor.FlowVisor.getHandlersCopy(FlowVisor.java:369) Jul 5 14:35:56 2013-07-05 14: 35:56,480 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.flowvisor.FlowVisor.getHandlersCopy(FlowVisor.java:369) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.flowvisor.api.FVUserAPIImpl.listDevices(FVUserAPIImpl.java:347) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.flowvisor.api.FVUserAPIImpl.listDevices(FVUserAPIImpl.java:347) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.lang.reflect.Method.invoke(Method.java:597) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.lang.reflect.Method.invoke(Method.java:597) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46) Jul 5 14:35:56 2013-07-05 14: 35:56,481 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46) Jul 5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86) Jul 5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86) Jul 5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200) Jul 5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200) Jul 5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208) Jul 5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208) Jul 5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) Jul 5 14:35:56 2013-07-05 14: 35:56,482 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) Jul 5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Remote host closed connection during handshake(exception = class java.io.EOFException) Jul 5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Remote host closed connection during handshake(exception = class java.io.EOFException) Jul 5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:333) Jul 5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:333) Jul 5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798) Jul 5 14:36:51 2013-07-05 14: 36:51,883 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:753) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:753) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.io.BufferedInputStream.read(BufferedInputStream.java:237) Jul 5 14:36:51 2013-07-05 14: 36:51,884 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.io.BufferedInputStream.read(BufferedInputStream.java:237) Jul 5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.readLine(Connection.java:241) Jul 5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.readLine(Connection.java:241) Jul 5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.getRequestConfig(Connection.java:150) Jul 5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.getRequestConfig(Connection.java:150) Jul 5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.run(Connection.java:204) Jul 5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.run(Connection.java:204) Jul 5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) Jul 5 14:36:51 2013-07-05 14: 36:51,885 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) Jul 5 14:37:42 2013-07-05 14: 37:42,386 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Failed to invoke method listDevices in class org.flowvisor.api.FVUserAPIXMLRPCImpl: Java heap space(exception = class java.lang.OutOfMemoryError) Jul 5 14:37:42 2013-07-05 14: 37:42,386 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : Failed to invoke method listDevices in class org.flowvisor.api.FVUserAPIXMLRPCImpl: Java heap space(exception = class java.lang.OutOfMemoryError) Jul 5 14:37:42 2013-07-05 14: 37:42,386 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.Arrays.copyOf(Arrays.java:2760) Jul 5 14:37:42 2013-07-05 14: 37:42,386 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.Arrays.copyOf(Arrays.java:2760) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.Arrays.copyOf(Arrays.java:2734) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.Arrays.copyOf(Arrays.java:2734) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.ArrayList.toArray(ArrayList.java:275) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.ArrayList.toArray(ArrayList.java:275) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.ArrayList.(ArrayList.java:131) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.util.ArrayList.(ArrayList.java:131) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.flowvisor.FlowVisor.getHandlersCopy(FlowVisor.java:369) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.flowvisor.FlowVisor.getHandlersCopy(FlowVisor.java:369) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.flowvisor.api.FVUserAPIImpl.listDevices(FVUserAPIImpl.java:347) Jul 5 14:37:42 2013-07-05 14: 37:42,387 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.flowvisor.api.FVUserAPIImpl.listDevices(FVUserAPIImpl.java:347) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.lang.reflect.Method.invoke(Method.java:597) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at java.lang.reflect.Method.invoke(Method.java:597) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106) Jul 5 14:37:42 2013-07-05 14: 37:42,388 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106) Jul 5 14:37:42 2013-07-05 14: 37:42,389 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46) Jul 5 14:37:42 2013-07-05 14: 37:42,389 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46) Jul 5 14:37:42 2013-07-05 14: 37:42,389 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86) Jul 5 14:37:42 2013-07-05 14: 37:42,389 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86) Jul 5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200) Jul 5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200) Jul 5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208) Jul 5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208) Jul 5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) Jul 5 14:37:42 2013-07-05 14: 37:42,390 WARN org.flowvisor.log.AnyLogger.log(AnyLogger.java:42) none : at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68)

Traffic still wasn't flowing at this point, and the listDevices call is from an admin seeing if it was completely stuck. He restarted FV shortly after that, and it seems to be working fine now.

Note that at BBN, we did not get the OutOfMemoryError exceptions at any point... Although our initial hang was off-hours, so no one noticed or tried to test anything until several hours later.

— Reply to this email directly or view it on GitHubhttps://github.com/OPENNETWORKINGLAB/flowvisor/issues/253#issuecomment-20525156 .

jbsbbn commented 11 years ago

When FV hung for the first time in I2 with list-devices command was the response for list-devices > 64k too?

I don't think it would be.

And second time after the restart of FV, did they execute the same commands like list-devices and getting stats-reply msg from the switch and it didn't hung up?

They did either listDevices or listSlices soon after restart, not sure which; they tried both again just now, and both worked fine.

Note also that traffic stopped flowing some 20 minutes before he tried either of those commands, so we don't think those commands caused FV to hang.

In my testing over here with my mininet one strange thing which I noticed was, I did not get the error (or FV didn't hung up) even when the stats_reply message was between 150k Bytes to 250 k Bytes. I am not sure of how many individual stats rules were there, but the total file length of the stats-reply which was received from the switches were between 150k-250k bytes.

Just to clarify, we don't have any reason to think that the problem from this ticket has anything to do with the flowstats >64K issue -- I suppose it could, but I don't think we have any specific evidence of that yet. The error we get with the flowstats bug is very distinctive, anyway, and doesn't look like this. That bug is https://github.com/OPENNETWORKINGLAB/flowvisor/issues/252 .

alshabib commented 11 years ago

Two things worry me in this crash report:

Caused by: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:390) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:409) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:408) at org.flowvisor.classifier.CookieTranslator.getCookieList(CookieTranslator.java:53) at org.flowvisor.classifier.FVClassifier.cleanUpFlowMods(FVClassifier.java:699) at org.flowvisor.classifier.FVClassifier.connectToControllers(FVClassifier.java:681) at org.flowvisor.classifier.FVClassifier.flowMapChanged(FVClassifier.java:857) ... 24 more

FV crashes on a concurrent modification exception resulting from a call to cleanUpFlowMods which is called when a slice is deleted of admin downed.

Jun 15 20:00:17 tulum -1>Jun 15 20:00:17 flowvisor: DEBUG execute: -> Jun 15 20:00:17 tulum -1>Jun 15 20:00:17 flowvisor: DEBUG execute: Request performed successfully Jun 15 20:00:17 tulum -1>Jun 15 20:00:17 flowvisor: DEBUG execute: <- Jun 15 20:00:17 tulum -1>Jun 15 20:00:17 flowvisor: DEBUG execute: -> Jun 15 20:00:18 tulum -1>Jun 15 20:00:18 flowvisor: DEBUG execute: Request performed successfully Jun 15 20:00:18 tulum -1>Jun 15 20:00:18 flowvisor: DEBUG execute: <- Jun 15 20:00:18 tulum -1>Jun 15 20:00:18 flowvisor: DEBUG execute: ->

This calls are probably due to me forgetting to uncomment (or change) a debug statement (whooopsie). But in this case it helps because it seems that someone (presumably FOAM) was making call to FV at a relatively high rate. Is this something that is plausible, is there anyway to check what calls, if any, FOAM was making?

jbsbbn commented 11 years ago

Sure, we can see what FOAM's doing in general, although maybe not what it was doing at that point any more, if its logs have rotated. It's still happening now, though, e.g. in the past minute:

[10:26:49] jbs@tulum:/home/jbs
+$ sudo egrep "Jul  8 10:25.+flowvisor.+DEBUG" /var/log/syslog | wc -l
120

It's often 114:

[10:26:54] jbs@tulum:/home/jbs
+$ sudo egrep "Jul  8 10:24.+flowvisor.+DEBUG" /var/log/syslog | wc -l
114

[10:27:07] jbs@tulum:/home/jbs
+$ sudo egrep "Jul  8 10:23.+flowvisor.+DEBUG" /var/log/syslog | wc -l
114

[10:27:10] jbs@tulum:/home/jbs
+$ sudo egrep "Jul  8 10:22.+flowvisor.+DEBUG" /var/log/syslog | wc -l
114

So this is probably something automated, like monitoring software.

FOAM says:

[10:28:03] jbs@tulum:/home/jbs
+$ grep "2013-07-08 10:25" /opt/foam/log/foam.log
2013-07-08 10:25:02,019 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-version>
2013-07-08 10:25:02,141 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/list-slivers>
2013-07-08 10:25:02,309 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-config>
2013-07-08 10:25:02,434 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-config>
2013-07-08 10:25:02,557 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-config>
2013-07-08 10:25:02,681 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-config>
2013-07-08 10:25:17,072 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-version>
2013-07-08 10:25:17,197 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-config>
2013-07-08 10:25:17,322 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-config>
2013-07-08 10:25:17,449 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-config>
2013-07-08 10:25:17,577 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-config>
2013-07-08 10:25:17,770 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/get-config>
2013-07-08 10:25:17,950 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/list-slivers>
2013-07-08 10:25:18,121 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:18,257 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:18,393 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:18,529 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:18,661 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:18,793 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:18,933 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:19,066 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:19,197 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:19,329 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:19,461 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:19,594 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:19,726 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>
2013-07-08 10:25:19,857 [INFO] - [REQUEST] [127.0.0.1] <http://localhost:3626/core/admin/show-sliver>

And indeed I think this is tango-monitor-foam's data-gathering.

jbsbbn commented 11 years ago

Two takeaways from that previous update:

If there's some sort of memory leak, it's possible that the high rate of queries is exacerbating that. But I don't think the high rate per se is causing FV to get stuck.

jbsbbn commented 11 years ago

Any ETA on this one? It's a blocker for deploying FV 1.0.x in GENI, which we'd like to announce at GEC 17 that we're going to do after GEC, but we can't with this bug still open. If it can't be fixed by GEC, then it can't -- we just want to plan accordingly.

mssumanth commented 11 years ago

As this has happened randomly, not sure what is the reason behind it. Neither were we able to reproduce the scenario over here. But just by the stack trace we have a very small fix, not sure if that would fix the issue. Did it re-occur any time after your last report?

On Fri, Jul 12, 2013 at 8:39 AM, Josh Smift notifications@github.comwrote:

Any ETA on this one? It's a blocker for deploying FV 1.0.x in GENI, which we'd like to announce at GEC 17 that we're going to do after GEC, but we can't with this bug still open. If it can't be fixed by GEC, then it can't -- we just want to plan accordingly.

— Reply to this email directly or view it on GitHubhttps://github.com/OPENNETWORKINGLAB/flowvisor/issues/253#issuecomment-20884477 .

jbsbbn commented 11 years ago

Nope, this hasn't happened again since the last update, at least not yet.

jbsbbn commented 11 years ago

We've had a couple more incidents at I2 in the past few days, which we think might be the same as this bug. (All times UTC.)

On 2013-08-17 at around 02:30, CPU usage on the I2 FV server spiked, and FV seemed to stop working. They restarted it at around 03:20. There's nothing obvious in syslog, but in the stderr log, they saw

2013-07-28 16:02:56.554:INFO::Started SslSelectChannelConnector@0.0.0.0:8081
java.lang.NullPointerException
    at org.flowvisor.classifier.FVClassifier.isRateLimited(FVClassifier.java:982)
    at org.flowvisor.slicer.FVSlicer.handleIOEvent(FVSlicer.java:584)
    at org.flowvisor.slicer.FVSlicer.handleEvent(FVSlicer.java:465)
    at org.flowvisor.events.FVEventLoop.doEventLoop(FVEventLoop.java:168)
    at org.flowvisor.FlowVisor.run(FlowVisor.java:196)
    at org.flowvisor.FlowVisor.main(FlowVisor.java:229)
2013-08-15 02:44:10.054:WARN::javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
Starting FlowVisor
Embedded DB issue, exiting : Another instance of Derby may have already booted the database /usr/share/db/flowvisor/FlowVisorDB.
Starting FlowVisor
--- Setting logging level to NOTE
2013-08-17 03:19:17.795:INFO::Logging to StdErrLog::DEBUG=false via org.eclipse.jetty.util.log.StdErrLog
2013-08-17 03:19:17.859:INFO::jetty-7.0.2.v20100331
2013-08-17 03:19:18.590:INFO::Started SslSelectChannelConnector@0.0.0.0:8081

That NullPointerException must have been before 2013-08-15, and doesn't seem to correspond to a crash per se. However, another one like it showed up later -- the next thing in the file, in fact.

On 2013-08-20 at around 19:58, CPU usage on the I2 FV server spiked again, and FV seemed to stop working again. They restarted it at around 20:23. In stderr log:

java.lang.NullPointerException at org.flowvisor.classifier.FVClassifier.isRateLimited(FVClassifier.java:982) at org.flowvisor.slicer.FVSlicer.handleIOEvent(FVSlicer.java:584) at org.flowvisor.slicer.FVSlicer.handleEvent(FVSlicer.java:465) at org.flowvisor.events.FVEventLoop.doEventLoop(FVEventLoop.java:168) at org.flowvisor.FlowVisor.run(FlowVisor.java:196) at org.flowvisor.FlowVisor.main(FlowVisor.java:229) java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.flowvisor.config.ConfigurationEvent.invoke(ConfigurationEvent.java:46) at org.flowvisor.classifier.FVClassifier.processChange(FVClassifier.java:850) at org.flowvisor.config.FVConfigurationController.fireChange(FVConfigurationController.java:80) at org.flowvisor.config.FlowSpaceImpl.notify(FlowSpaceImpl.java:649) at org.flowvisor.config.FlowSpaceImpl.notifyChange(FlowSpaceImpl.java:338) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.flowvisor.config.FVConfigProxy.invoke(FVConfigProxy.java:21) at $Proxy1.notifyChange(Unknown Source) at org.flowvisor.api.FVUserAPIImpl.deleteSlice(FVUserAPIImpl.java:450) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.invoke(ReflectiveXmlRpcHandler.java:115) at org.apache.xmlrpc.server.ReflectiveXmlRpcHandler.execute(ReflectiveXmlRpcHandler.java:106) at org.apache.xmlrpc.server.XmlRpcServerWorker.execute(XmlRpcServerWorker.java:46) at org.apache.xmlrpc.server.XmlRpcServer.execute(XmlRpcServer.java:86) at org.apache.xmlrpc.server.XmlRpcStreamServer.execute(XmlRpcStreamServer.java:200) at org.apache.xmlrpc.webserver.Connection.run(Connection.java:208) at org.apache.xmlrpc.util.ThreadPool$Poolable$1.run(ThreadPool.java:68) Caused by: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:392) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:391) at org.flowvisor.classifier.CookieTranslator.getCookieList(CookieTranslator.java:53) at org.flowvisor.classifier.FVClassifier.cleanUpFlowMods(FVClassifier.java:699) at org.flowvisor.classifier.FVClassifier.connectToControllers(FVClassifier.java:681) at org.flowvisor.classifier.FVClassifier.flowMapChanged(FVClassifier.java:857) ... 26 more Starting FlowVisor --- Setting logging level to NOTE 2013-08-20 20:23:07.694:INFO::Logging to StdErrLog::DEBUG=false via org.eclipse.jetty.util.log.StdErrLog 2013-08-20 20:23:07.727:INFO::jetty-7.0.2.v20100331 2013-08-20 20:23:08.315:INFO::Started SslSelectChannelConnector@0.0.0.0:8081

So that stack trace came at some point between the 2013-08-17 restart and the 2013-08-20 restart, but we don't know for sure exactly when.

mssumanth commented 11 years ago

Hi Josh,

Were this traces obtained on the latest FV1.0-RC branch?

Thanks Sumanth

jbsbbn commented 11 years ago

No; we're not running RC versions on Internet2's production GENI FlowVisor. It's still 1.0.8.

mssumanth commented 11 years ago

Hi Josh,

1.0.11-RC2 the latest code from the branch 1.0-RC contains the fix for this. Is that being tested anywhere? Because, we were not entirely sure that 1.0.11-RC2 would fix it completely, but based on the exceptions and the stack trace info, that fix was provided and it was presumed that it would be run for several weeks to determine if the exceptions and crashes would still exist, which hopefully should not.

Thanks & Best Regards Sumanth

On Wed, Aug 21, 2013 at 4:02 AM, Josh Smift notifications@github.comwrote:

No; we're not running RC versions on Internet2's production GENI FlowVisor. It's still 1.0.8.

— Reply to this email directly or view it on GitHubhttps://github.com/OPENNETWORKINGLAB/flowvisor/issues/253#issuecomment-23009521 .