simlaudato / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

OOM during creating ngram index #922

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Use the generated twitter message dataset. 
13.75 GB per partition, 4 partitions per machine,  6GB heap size,  256MB 
in-memory components for each index.

Asterix Configuration
nc.java.opts                             :-Xmx6144m  -XX:MinHeapFreeRatio=5  
-XX:MaxHeapFreeRatio=15
cc.java.opts                             :-Xmx4096m -XX:MinHeapFreeRatio=5  
-XX:MaxHeapFreeRatio=15
max.wait.active.cluster                  :60
storage.buffercache.pagesize             :131072
storage.buffercache.size                 :1073741824
storage.buffercache.maxopenfiles         :214748364
storage.memorycomponent.pagesize         :131072
storage.memorycomponent.numpages         :2048
storage.metadata.memorycomponent.numpages:64
storage.memorycomponent.numcomponents    :2
storage.memorycomponent.globalbudget     :2147483648

DDL:
create index tNGramIdx on TwitterMessages(message-text) type ngram(2);

Error:
java.lang.OutOfMemoryError: unable to create new native thread

Original issue reported on code.google.com by buyingyi@gmail.com on 8 Aug 2015 at 7:25

GoogleCodeExporter commented 8 years ago
Ah, interesting... Is this deterministic? It seems really similar to the issue 
that started to crop up on the CI server just as the feeds patch was merged. 

Original comment by parshim...@gmail.com on 9 Aug 2015 at 7:13

GoogleCodeExporter commented 8 years ago
It is deterministic in my setting...

Best,
Yingyi

Original comment by buyingyi@gmail.com on 9 Aug 2015 at 7:29

GoogleCodeExporter commented 8 years ago
Can it be reproduceable in one machine? Otherwise, I can't reproduce this. 

Original comment by wangs...@gmail.com on 9 Aug 2015 at 3:51

GoogleCodeExporter commented 8 years ago
I think so.  You can let the dataset size be larger than your memory.

Best,
Yingyi

Original comment by buyingyi@gmail.com on 9 Aug 2015 at 4:26

GoogleCodeExporter commented 8 years ago
Unable to create native thread usually means ulimit is not set correctly for 
nprocesses. Just a thought

Original comment by zheilb...@gmail.com on 9 Aug 2015 at 6:06

GoogleCodeExporter commented 8 years ago
Is it only specific to ngram index?  How about a secondary B-tree?

Original comment by che...@srch2.com on 10 Aug 2015 at 1:22

GoogleCodeExporter commented 8 years ago
Yes. Both secondary B-Tree and R-Tree work fine.

Best,
Yingyi

Original comment by buyingyi@gmail.com on 10 Aug 2015 at 1:32

GoogleCodeExporter commented 8 years ago
Hmmm, I think Alex Bhem implemented this ngram index.  Can we reproduce this 
issue using an earlier version of the master?  

Original comment by che...@gmail.com on 10 Aug 2015 at 7:21

GoogleCodeExporter commented 8 years ago
Trying to reproduce the issue on my machine with five partitions - each 16GB. 
Let's see what happens. 

Original comment by wangs...@gmail.com on 11 Aug 2015 at 4:26

GoogleCodeExporter commented 8 years ago
BTW, here is the "ulimit -a" of my setting:

$ ulimit -a

core file size          (blocks, -c) 2000

data seg size           (kbytes, -d) unlimited

scheduling priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 63280

max locked memory       (kbytes, -l) 64

max memory size         (kbytes, -m) unlimited

open files                      (-n) 40960

pipe size            (512 bytes, -p) 8

POSIX message queues     (bytes, -q) 819200

real-time priority              (-r) 0

stack size              (kbytes, -s) 10240

cpu time               (seconds, -t) unlimited

max user processes              (-u) 1024

virtual memory          (kbytes, -v) unlimited

file locks                      (-x) unlimited

Original comment by buyingyi@gmail.com on 11 Aug 2015 at 4:40

GoogleCodeExporter commented 8 years ago
I'm testing on two machines. They are independent (not on the cluster). 

machine #1:
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127645
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 127645
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

machine #2:
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Original comment by wangs...@gmail.com on 11 Aug 2015 at 4:46

GoogleCodeExporter commented 8 years ago
So <1024 is probably a reasonable lower default bound for threads. We should 
also probably lower our current default allowance on open file handles, as 
again <1024 looks like a normal limit for a user. There's a question though of 
how much below 1024 we should go (i.e. how many other resources should we 
expect might be used by that user)

Original comment by ima...@uci.edu on 11 Aug 2015 at 5:23

GoogleCodeExporter commented 8 years ago
It is still creating ExternalSortRunGenerator?????.waf files. The number of 
files is 805. And it is still growing. Currently, the temporary files are 
occupying 805 * 320MB = 257GB. Let's see what happens.  

Original comment by wangs...@gmail.com on 11 Aug 2015 at 6:55

GoogleCodeExporter commented 8 years ago
It threw this exception and suddenly the ExternalSortRun*.waf files are being 
created again. 

edu.uci.ics.asterix.metadata.MetadataException: An index with name 'tNGramIdx' 
already exists.
Caused by: 
edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException:
 Failed to insert key since key already exists.

-- detailed message:
Aug 11, 2015 7:10:24 AM edu.uci.ics.asterix.api.http.servlet.RESTAPIServlet 
handleRequest
SEVERE: An index with name 'tNGramIdx' already exists.
edu.uci.ics.asterix.metadata.MetadataException: An index with name 'tNGramIdx' 
already exists.
    at edu.uci.ics.asterix.metadata.MetadataNode.addIndex(MetadataNode.java:219)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
    at sun.rmi.transport.Transport$2.run(Transport.java:202)
    at sun.rmi.transport.Transport$2.run(Transport.java:199)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Transport.java:198)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:567)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:828)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.access$400(TCPTransport.java:619)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:684)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:681)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:681)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
    at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:275)
    at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:252)
    at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161)
    at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:194)
    at java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:148)
    at com.sun.proxy.$Proxy11.addIndex(Unknown Source)
    at edu.uci.ics.asterix.metadata.MetadataManager.addIndex(MetadataManager.java:458)
    at edu.uci.ics.asterix.aql.translator.AqlTranslator.handleCreateIndexStatement(AqlTranslator.java:968)
    at edu.uci.ics.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:282)
    at edu.uci.ics.asterix.api.http.servlet.RESTAPIServlet.handleRequest(RESTAPIServlet.java:156)
    at edu.uci.ics.asterix.api.http.servlet.DDLAPIServlet.handleRequest(DDLAPIServlet.java:25)
    at edu.uci.ics.asterix.api.http.servlet.RESTAPIServlet.doGet(RESTAPIServlet.java:129)
    at edu.uci.ics.asterix.api.http.servlet.DDLAPIServlet.doGet(DDLAPIServlet.java:25)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
    at org.eclipse.jetty.server.Server.handle(Server.java:347)
    at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
    at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:907)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:562)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:214)
    at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
    at java.lang.Thread.run(Thread.java:745)
Caused by: 
edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException:
 Failed to insert key since key already exists.
    at edu.uci.ics.hyracks.storage.am.lsm.btree.impls.LSMBTree.insert(LSMBTree.java:391)
    at edu.uci.ics.hyracks.storage.am.lsm.btree.impls.LSMBTree.modify(LSMBTree.java:362)
    at edu.uci.ics.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:331)
    at edu.uci.ics.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:314)
    at edu.uci.ics.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceInsert(LSMTreeIndexAccessor.java:153)
    at edu.uci.ics.asterix.metadata.MetadataNode.insertTupleIntoIndex(MetadataNode.java:309)
    at edu.uci.ics.asterix.metadata.MetadataNode.addIndex(MetadataNode.java:217)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
    at sun.rmi.transport.Transport$2.run(Transport.java:202)
    at sun.rmi.transport.Transport$2.run(Transport.java:199)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Transport.java:198)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:567)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:828)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.access$400(TCPTransport.java:619)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:684)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:681)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:681)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    ... 1 more

Original comment by wangs...@gmail.com on 11 Aug 2015 at 4:16

GoogleCodeExporter commented 8 years ago
More than 100 threads are being executed per sec. The CC log shows the 
following message:

Aug 11, 2015 1:53:01 PM 
edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: GetResultPartitionLocations: JobId@JID:15 ResultSetId@RSID:0 
Known@[127.0.0.1:58800 SUCCESS (empty), 127.0.0.1:58800 SUCCESS (empty), 
127.0.0.1:58800 SUCCESS (empty), null, 127.0.0.1:58800 SUCCESS (empty)]

Original comment by wangs...@gmail.com on 11 Aug 2015 at 10:32