Closed zdYng closed 4 years ago
As time goes on, memory consumption is also increasing!
Do nothing, memory consumption is increasing!
please try to check the log size and change the log level
I think 300MB is not enough for the service. And I think the processes are OOM killed during starting. Can you provide more memory for the container? (About 4-8GB)
However, I have not added data yet, just simply start the service.
I will try again,
Sorry, my cloud server only has 2G of memory and has been tested. It only takes a lot of resources (including cpu and memory) during the time when the docker is started. After that, it will also stabilize, and tested the python-side example, everything works fine. It’s just that the port will be killed after a while.
For a while, nebula-storaged will be automatically killed, is the case?
I have given docker unlimited memory usage.
When docker automatically kills the nebula-storage port, it needs to be restarted to continue normal use.
For a while, nebula-storaged will be automatically killed, is the case?
I have given docker unlimited memory usage.
This is very strange. Can you provide storaged log (in /usr/local/nebula/log/).
BTW, is storaged killed by OOM, or crashed?
When docker automatically kills the nebula-storage port, it needs to be restarted to continue normal use.
Right.
Sometimes, the port is occupied so storaged can't restart. And You need to wait a few seconds to retry
[root@nebula logs]# ll
total 62428
lrwxrwxrwx 1 root root 63 Nov 21 08:12 nebula-graphd.ERROR -> nebula-graphd.nebula.invalid-user.log.ERROR.20191121-081220.128
lrwxrwxrwx 1 root root 62 Nov 21 08:09 nebula-graphd.INFO -> nebula-graphd.nebula.invalid-user.log.INFO.20191121-080939.128
lrwxrwxrwx 1 root root 65 Nov 21 08:12 nebula-graphd.WARNING -> nebula-graphd.nebula.invalid-user.log.WARNING.20191121-081220.128
-rw-r--r-- 1 root root 2461 Nov 21 09:05 nebula-graphd.nebula.invalid-user.log.ERROR.20191121-081220.128
-rw-r--r-- 1 root root 25818390 Nov 21 14:49 nebula-graphd.nebula.invalid-user.log.INFO.20191121-080939.128
-rw-r--r-- 1 root root 2461 Nov 21 09:05 nebula-graphd.nebula.invalid-user.log.WARNING.20191121-081220.128
lrwxrwxrwx 1 root root 61 Nov 21 08:09 nebula-metad.ERROR -> nebula-metad.nebula.invalid-user.log.ERROR.20191121-080939.56
lrwxrwxrwx 1 root root 60 Nov 21 08:09 nebula-metad.INFO -> nebula-metad.nebula.invalid-user.log.INFO.20191121-080939.56
lrwxrwxrwx 1 root root 63 Nov 21 08:09 nebula-metad.WARNING -> nebula-metad.nebula.invalid-user.log.WARNING.20191121-080939.56
-rw-r--r-- 1 root root 1195 Nov 21 09:04 nebula-metad.nebula.invalid-user.log.ERROR.20191121-080939.56
-rw-r--r-- 1 root root 43717 Nov 21 09:35 nebula-metad.nebula.invalid-user.log.INFO.20191121-080939.56
-rw-r--r-- 1 root root 1195 Nov 21 09:04 nebula-metad.nebula.invalid-user.log.WARNING.20191121-080939.56
lrwxrwxrwx 1 root root 65 Nov 21 09:08 nebula-storaged.ERROR -> nebula-storaged.nebula.invalid-user.log.ERROR.20191121-090803.711
lrwxrwxrwx 1 root root 64 Nov 21 09:07 nebula-storaged.INFO -> nebula-storaged.nebula.invalid-user.log.INFO.20191121-090759.711
lrwxrwxrwx 1 root root 67 Nov 21 09:08 nebula-storaged.WARNING -> nebula-storaged.nebula.invalid-user.log.WARNING.20191121-090803.711
-rw-r--r-- 1 root root 532745 Nov 21 08:12 nebula-storaged.nebula.invalid-user.log.ERROR.20191121-080940.159
-rw-r--r-- 1 root root 532616 Nov 21 09:04 nebula-storaged.nebula.invalid-user.log.ERROR.20191121-090304.455
-rw-r--r-- 1 root root 958600 Nov 21 09:08 nebula-storaged.nebula.invalid-user.log.ERROR.20191121-090803.711
-rw-r--r-- 1 root root 9586097 Nov 21 08:44 nebula-storaged.nebula.invalid-user.log.INFO.20191121-080939.159
-rw-r--r-- 1 root root 6772768 Nov 21 09:04 nebula-storaged.nebula.invalid-user.log.INFO.20191121-090303.455
-rw-r--r-- 1 root root 17519384 Nov 21 09:34 nebula-storaged.nebula.invalid-user.log.INFO.20191121-090759.711
-rw-r--r-- 1 root root 532842 Nov 21 08:12 nebula-storaged.nebula.invalid-user.log.WARNING.20191121-080939.159
-rw-r--r-- 1 root root 532616 Nov 21 09:04 nebula-storaged.nebula.invalid-user.log.WARNING.20191121-090304.455
-rw-r--r-- 1 root root 958600 Nov 21 09:08 nebula-storaged.nebula.invalid-user.log.WARNING.20191121-090803.711
-rw-r--r-- 1 root root 2325 Nov 21 09:05 stderr.log
-rw-r--r-- 1 root root 0 Nov 21 08:09 stdout.log
[root@nebula logs]# datetime
bash: datetime: command not found
[root@nebula logs]# date
Thu Nov 21 14:50:08 UTC 2019
One of the ERROR files:
[root@nebula logs]# cat nebula-graphd.ERROR
Log file created at: 2019/11/21 08:12:20
Running on machine: nebula
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E1121 08:12:20.891650 264 InsertVertexExecutor.cpp:50] No schema found for person
E1121 09:00:30.294275 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1121 09:01:18.951679 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:02:29.934098 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:03:21.914382 141 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1121 09:04:39.400250 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1121 09:04:57.863484 141 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:04:57.863958 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:05:04.963557 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:05:04.966099 141 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:08:17.128510 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.128862 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.128975 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.129107 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.129220 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.129348 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.129559 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.129861 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.130076 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.130481 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.130692 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.130848 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.130954 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.131075 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.131217 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.131393 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.131539 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.131651 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.131775 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.131901 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
E1121 09:08:17.132025 769 Part.cpp:63] Cannot fetch the last committed log id from the storage engine
[root@nebula logs]# cat stderr.log
E1121 08:12:20.891650 264 InsertVertexExecutor.cpp:50] No schema found for person
E1121 09:00:30.294275 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1121 09:01:18.951679 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:02:29.934098 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:03:21.914382 141 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1121 09:04:39.400250 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1121 09:04:57.863484 141 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:04:57.863958 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:05:04.963557 202 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1121 09:05:04.966099 141 StorageClient.inl:102] Request to [0.0.0.0:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
My current time zone is CST,the docker 's time zone is UTC.
[root@nebula nebula]# scripts/nebula.service status all
[INFO] nebula-metad: Running as 56, Listening on 45500
[INFO] nebula-graphd: Running as 128, Listening on 3699
[INFO] nebula-storaged: Exited
[root@nebula nebula]# ps -aux |grep "nebula"
root 56 0.3 0.9 487748 16016 ? Ssl 08:09 1:22 /usr/local/nebula/bin/nebula-metad --flagfile /usr/local/nebula/etc/nebula-metad.conf
root 128 0.3 0.3 219140 5960 ? Ssl 08:09 1:29 /usr/local/nebula/bin/nebula-graphd --flagfile /usr/local/nebula/etc/nebula-graphd.conf
root 1117 0.0 0.0 9092 668 pts/5 S+ 14:59 0:00 grep --color=auto nebula
[root@nebula nebula]#
My container does not have any restrictions on resources such as memory. I think the possibility of OOM is relatively large, but I also had a situation where the docker occupied more resources than 500M, and there was no similar problem. Can I optimize the storage layer to reduce the memory usage? I set the container crash to restart automatically.
Thank you for your report.
Can we reach you by wechat (wexin)? Find and add friend NebulaGraphBot
BTW, if your partition_num = 1024 ( when you create a new graph space), it takes about 4GB memory. You can change it to 10 to reduce memory usage (for a functional test)
Yes, I have added NebulaGraphBot, I will modify partition_num to test it.
My WeChat is qzd1460786009
try partition_num = 1.
When I started the docker version of nebula,
the resources occupied by docker increased directly from 4MB to 300MB.
The memory takes up 4MB before booting, and the CPU takes up 0%:
When I start nebula, start the command:
Docker takes up memory 159MB,CPU 120%:
Sometimes there are times when the port cannot be fully started:
Start a few more times.
But sometimes it can't be started :