Closed ceocoder closed 13 years ago
little more background,
4 machine cluster with replicatin_level =1,
jvm set to 6G (I've tried with jvm at 12 and 18 to same effect) on boxes swappiness is set to 0
for disks I have 3PAR iSCSI mounts of 500GB each
Looks like you had a big GC that caused the reads to back up and the JVM was running out of memory.
What kind of writes per second were you seeing? How many solr cores?
Turning off row_cache completely and disabling swap I was able to ingest using one of the 4 nodes for about 15 hours and counting, and just started ingesting on a second node, and keep getting this error frequently -
10:09:26,663 INFO SolrCore:1324 - [htmlnew] webapp=/solandra path=/update params={commit=true} status=500 QTime=1112 10:09:26,664 ERROR SolrDispatchFilter:139 - java.lang.IllegalStateException: Unable to reserve an id at lucandra.cluster.CassandraIndexManager.getNextId(CassandraIndexManager.java:341) at solandra.SolandraIndexWriter.addDoc(SolandraIndexWriter.java:194) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:170) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:134) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:536) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:930) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:405) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:451)
This should be fixed. Please retest
Verified fix
On Client Side
xml error - The operation has timed out xml error - The server committed a protocol violation. Section=ResponseStatusLin e
On server side,
172.16.5.11
10:27:14,426 INFO GCInspector:184 - Lucandra.TermInfo 0,0 0/1000 0/10000000 10:27:14,432 INFO CassandraIndexManager:148 - ShardInfo for htmlsni has expired 10:27:14,436 INFO SSTableTracker:65 - saving system-LocationInfo-KeyCache for LocationInfo of system 10:27:46,829 INFO GCInspector:133 - GC for ConcurrentMarkSweep: 14490 ms, 84168 reclaimed leaving 6395477472 used; max is 6552551424 10:27:46,829 INFO GCInspector:146 - Pool Name Active Pending 10:27:46,830 INFO GCInspector:161 - ReadStage 0 0 10:27:46,830 INFO GCInspector:161 - RequestResponseStage 0 0 10:27:46,830 INFO GCInspector:161 - ReadRepair 0 0 10:27:46,831 INFO GCInspector:161 - MutationStage 18 20 10:27:46,831 INFO GCInspector:161 - GossipStage 0 0 10:27:46,831 INFO GCInspector:161 - AntiEntropyStage 0 0 10:29:17,128 INFO UpdateRequestProcessor:171 - {} 0 812628 10:29:17,131 INFO GCInspector:161 - MigrationStage 0 0 10:29:17,131 INFO GCInspector:161 - StreamStage 0 0 10:29:17,132 INFO GCInspector:161 - MemtablePostFlusher 1 1 10:29:17,132 INFO GCInspector:161 - FlushWriter 1 1 10:29:17,132 INFO GCInspector:161 - MiscStage 0 0 10:29:17,132 INFO GCInspector:161 - FlushSorter 0 0 10:29:17,132 INFO GCInspector:161 - InternalResponseStage 0 0 10:29:17,133 INFO GCInspector:161 - HintedHandoff 0 0 10:29:17,133 INFO GCInspector:165 - CompactionManager n/a 76 10:29:17,133 INFO GCInspector:177 - MessagingService n/a 141,202 10:29:17,133 INFO GCInspector:181 - ColumnFamily Memtable ops,data Row cache size/cap Key cache size/cap 10:29:17,133 INFO GCInspector:184 - system.LocationInfo 0,0 0/0 1/1 10:29:17,134 INFO GCInspector:184 - system.HintsColumnFamily 0,0 0/0 1/1 10:29:17,134 INFO GCInspector:184 - system.Migrations 0,0 0/0 1/1 10:29:17,134 INFO GCInspector:184 - system.Schema 0,0 0/0 1/1 10:29:17,134 INFO GCInspector:184 - system.IndexInfo 0,0 0/0 0/1 10:29:17,134 INFO GCInspector:184 - Skunk.MIDMeta 0,0 0/1000 0/10000000 10:29:17,135 INFO GCInspector:184 - Skunk.TimeMID 0,0 0/1000 0/10000000 10:29:17,135 INFO GCInspector:184 - Skunk.Emails 0,0 0/1000 0/100000 10:29:17,135 INFO GCInspector:184 - Skunk.MetaTags 0,0 0/1000 0/1000000 10:29:17,146 INFO GCInspector:184 - L.TI 364185,7283808 0/1000 0/10000000 10:29:17,146 INFO GCInspector:184 - L.Docs 2160,206878593 0/1000 0/10000000 10:29:17,147 INFO GCInspector:184 - L.SI 138623,6786771 1000/1000 20/100000 10:29:17,147 INFO GCInspector:184 - L.TL 1233280,37118783 0/1000 0/1000000 10:29:17,147 INFO GCInspector:184 - Lucandra.Documents 0,0 0/1000 0/10000000 10:29:17,147 INFO GCInspector:184 - Lucandra.TermInfo 0,0 0/1000 0/10000000 10:29:17,148 WARN MessagingService:545 - Dropped 424 MUTATION messages in the last 5000ms 10:29:17,156 WARN MessagingService:545 - Dropped 4 REQUEST_RESPONSE messages in the last 5000ms 10:29:17,160 INFO GCInspector:146 - Pool Name Active Pending 10:29:17,161 INFO GCInspector:161 - ReadStage 0 0 10:29:17,163 INFO GCInspector:161 - RequestResponseStage 0 0 10:29:17,164 INFO GCInspector:161 - ReadRepair 0 0 10:29:17,164 INFO GCInspector:161 - MutationStage 32 38 10:29:17,164 INFO GCInspector:161 - GossipStage 1 81 10:29:17,165 INFO GCInspector:161 - AntiEntropyStage 0 0 10:29:17,165 INFO GCInspector:161 - MigrationStage 0 0 10:29:17,165 INFO GCInspector:161 - StreamStage 0 0 10:29:17,165 INFO GCInspector:161 - MemtablePostFlusher 1 1 10:29:17,166 INFO GCInspector:161 - FlushWriter 1 1 10:29:17,166 INFO GCInspector:161 - MiscStage 0 0 10:29:17,166 INFO GCInspector:161 - FlushSorter 0 0 10:29:17,166 INFO GCInspector:161 - InternalResponseStage 0 0 10:29:17,167 INFO GCInspector:161 - HintedHandoff 0 0 10:29:17,167 INFO GCInspector:165 - CompactionManager n/a 76 10:29:17,167 INFO GCInspector:177 - MessagingService n/a 0,0 10:29:17,167 INFO GCInspector:181 - ColumnFamily Memtable ops,data Row cache size/cap Key cache size/cap
ON 172.16.5.12
10:27:53,776 INFO SolrCore:1324 - [htmlsni] webapp=/solandra path=/update params={commit=true} status=500 QTime=290298 10:27:53,777 ERROR SolrDispatchFilter:139 - java.lang.RuntimeException: Read command failed after 100 attempts at lucandra.CassandraUtils.robustRead(CassandraUtils.java:390) at lucandra.cluster.CassandraIndexManager.getShardInfo(CassandraIndexManager.java:188) at lucandra.cluster.CassandraIndexManager.getNextId(CassandraIndexManager.java:328) at solandra.SolandraIndexWriter.addDoc(SolandraIndexWriter.java:194) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:170) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:134) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:536) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:930) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:405) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:451)
.... til this time
10:29:17,257 INFO HintedHandOffManager:192 - Started hinted handoff for endpoint /172.16.5.11 10:29:17,258 INFO Gossiper:569 - InetAddress /172.16.5.11 is now UP 10:29:17,260 INFO HintedHandOffManager:248 - Finished hinted handoff of 0 rows to endpoint /172.16.5.11 10:29:17,363 INFO CassandraIndexManager:148 - ShardInfo for htmlsni has expired 10:29:17,403 INFO CassandraIndexManager:226 - htmlsni has 29 shards