server stops suddenly / std::bad_alloc memory exhaustion

gits7r commented 9 years ago

Hi,

Server was running fine. Just upgraded to latest commits few days ago. Now when I type electrum-server starts, it looks like it starts (take some time) but when I run electrum-server getinfo (after ~1 minute) it says server not running. There is nothing in the log files which would be interesting, except starting TCP Server on ... and starting SSL server on...

bitcoind is working good, didn't touch it. I have tried restarting bitcoind as well, and then then electrum server started and was running for few hours, but died again with nothing in the logfile. How can I debug this?

abitfan commented 9 years ago

You can try to run run_electrum_server directly and see if it spits out more info.

gits7r commented 9 years ago

INFO:electrum:Starting Electrum server on 127.0.0.1 ERROR:electrum:db init Traceback (most recent call last): File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 180, in init self.db_utxo = DB(self.dbpath, 'utxo', config.getint('leveldb', 'utxo_cache')) File "/usr/lib/python2.7/ConfigParser.py", line 359, in getint return self._get(section, int, option) File "/usr/lib/python2.7/ConfigParser.py", line 356, in _get return conv(self.get(section, option)) File "/usr/lib/python2.7/ConfigParser.py", line 618, in get raise NoOptionError(option, section) NoOptionError: No option 'utxo_cache' in section: 'leveldb' INFO:electrum:Stopping Stratum INFO:electrum:Initializing database Traceback (most recent call last): File "/usr/local/bin/run_electrum_server", line 4, in import('pkg_resources').run_script('electrum-server==1.0', 'run_electrum_server') File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 534, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1445, in run_script exec(script_code, namespace, namespace) File "/usr/local/lib/python2.7/dist-packages/electrum_server-1.0-py2.7.egg/EGG-INFO/scripts/run_electrum_server", line 256, in

File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 57, in init File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 195, in init File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 324, in put_node AttributeError: 'Storage' object has no attribute 'db_utxo'

ecdsa commented 9 years ago

you have to run run_electrum_server.py, not run_electrum_server

gits7r commented 9 years ago

Maybe the database was corrupt. I have deleted electrum's database and downloaded it again from foundry. Started fine and working for last hours under normal parameters... slowly catching up.

Could my database just get corrupted on the fly, without anyone doing anything wrong? I know how to start/stop the server and never kill -9 electrum.

gits7r commented 9 years ago

@ecdsa It have downloaded a fresh leveldb dump from foundry, started again and it died again unfortunately. There is a bug here. I run run_electrum_server.py in console and here is what I get:

INFO:electrum:Starting Electrum server on 127.0.0.1 INFO:electrum:Database version 3. INFO:electrum:Pruning limit for spent outputs is 10000. INFO:electrum:Blockchain height 375506 INFO:electrum:UTXO tree root hash: c5e8dca8fefc2e5f8ab198aac02824d9b0b3e08c414cd 249fa62bb0d0408221a INFO:electrum:Coins in database: 1463486254755852 INFO:electrum:catching up missing headers: 375492 375506 INFO:electrum:TCP server started on 127.0.0.1:50001 INFO:electrum:SSL server started on 127.0.0.1:50002 Exception in thread Thread-4: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner File "/usr/lib/python2.7/threading.py", line 763, in run File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 83, in do_catch_up File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 657, in catch_up File "build/bdist.linux-x86_64/egg/electrumserver/blockchain_processor.py", line 413, in import_block File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 625, in import_transaction File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 585, in set_spent File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 144, in get File "_plyvel.pyx", line 299, in plyvel._plyvel.DB.get (plyvel/_plyvel.cpp:4025) File "_plyvel.pyx", line 103, in plyvel._plyvel.db_get (plyvel/_plyvel.cpp:1891) File "_plyvel.pyx", line 80, in plyvel._plyvel.raise_for_status (plyvel/_plyvel.cpp:1698) IOError: IO error: /home/bitnode/electrum-server/electrum-leveldb-utxo-10000/hist/30610918.ldb: Too many open files

What could be the issue? I have the correct limits setup in /etc/security/limits.conf for the user running electrum.

abitfan commented 9 years ago

If this is a ubuntu install you also need to edit /etc/pam.d/common-session and add session required pam_limits.so To test that your changes are ok login with the user running electrum and run: ulimit -n

gits7r commented 9 years ago

@abitfan I am on Debian Jessie. Unfortunately, here is what ulimit -n run as the user running electrum says: sudo -u bitnode -i ulimit -n 1024

I have in /etc/security/limits.conf the following appended: bitnode hard nofile 65536 bitnode soft nofile 65536

abitfan commented 9 years ago

Actually the common-session mod is required for debian as well

gits7r commented 9 years ago

@abitfan can you let me know step by step what do I need to do in order to enable it? thanks.

abitfan commented 9 years ago

as root: echo "session required pam_limits.so" >> /etc/pam.d/common-session

gits7r commented 9 years ago

I have done that. now the limit is 65536 for 'bitnode' which is the user I run electrum-server as. It still did not fix it. It starts and dies with nothing relevant in electrum.log. Running from console I get the following:

sudo -u bitnode -i run_electrum_server.py INFO:electrum:Starting Electrum server on 127.0.0.1 ERROR:electrum:db init Traceback (most recent call last): File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 180, in init self.db_utxo = DB(self.dbpath, 'utxo', config.getint('leveldb', 'utxo_cache')) File "build/bdist.linux-x86_64/egg/electrumserver/storage.py", line 129, in init self.db = plyvel.DB(os.path.join(path, name), create_if_missing=True, compression=None, lru_cache_size=cache_size) File "_plyvel.pyx", line 236, in plyvel._plyvel.DB.init (plyvel/_plyvel.cpp:3129) File "_plyvel.pyx", line 80, in plyvel._plyvel.raise_for_status (plyvel/_plyvel.cpp:1698) IOError: IO error: lock /home/bitnode/electrum-server/electrum-leveldb-utxo-10000/utxo/LOCK: Resource temporarily unavailable INFO:electrum:Stopping Stratum

abitfan commented 9 years ago

Can you try this with a fresh db ?

gits7r commented 9 years ago

Ok. I have tried with fresh DB 10 times. Correct dbs, checked the hash and everything. I have set the limits properly like you said, the user running electrum now has soft 65536 and hard 65536. It always dies like this after few seconds:

INFO:electrum:Starting Electrum server on 127.0.0.1 terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

I am on latest commit. What could be wrong?

shsmith commented 9 years ago

bad_alloc sounds like memory exhaustion. Try allocating more swap space.
You could also reduce the cache sizes via your electrum.conf hist_cache, utxo_cache and addr_cache settings.

gits7r commented 9 years ago

My swap allocated space looks empty. This machine used to work with electrum very well. Can it suddenly require more swap space?

gits7r commented 9 years ago

@shsmith @ecdsa I have increased the allocated RAM for this virtual machine from 8GB to 16GB and increased the swap space from 5GB to 8GB and this seam to have fixed it -- now electrum is catching up with bitcoind height and updating leveldb.

Do we require more resources now to run electrum server?

gits7r commented 9 years ago

Tried 100 more times with different changes, it still won't work. I think this is not related to electrum-server, this is maybe the fault of not enough hard disk I/O operations allowed since the server doesn't have SSD (server has normal SATA drives, no raid). This is a virtual machine, hosted on shared hardware - on the same hardware I have another electrum server + many other things so I guess the disk just can't take all of it and the hypervisor doesn't allocate more I/O hard disk resources to this virtual machine in order to protect the others.

We already know that leveldb uses the disk very much, it needs SSD. So, I will close this, since I don't see a bug in electrum-server. The last log message is: [18/10/2015-05:06:38] block 379312 (410 401.10s) 457255bb18a4ba9e792ab8f3e2b4d5fd34f3dccf7b008ed5d278622c31f3e280 (4.49tx/s, 255.59s/block) (eta 11.3 hours, 112 blocks)

You can see it takes a lot to expand blocks. The RAM/CPU/Swap space resources are plenty, but the hard disk is not.

EagleTM commented 9 years ago

I'm seeing the same issue here: server with 4 GB RAM and 4 GB swap dies after around a week of running, with caches at half the size of the new lower default (so they are not the issue): Crash message "terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc"

It's definitely running out of memory / swap. I've noticed the current git head pulls in like 90% of memory (with swap that'd be 8 GB on my system) on startup to catch up blocks.

This might be related to recent changes like "writing once per block" or the "ordering of tx" stuff. server versions from June 2015 don't have this issue.

Unless memory footprint can be reduced we should recommend at least 8 GB or RAM - better 16 GB - for running electrum server

lvets commented 8 years ago

Any update on this? I'm still seeing electrum-server taking 16GB of RAM + 4 GB swap on a server when processing blocks...

EagleTM commented 8 years ago

We're investigating the issue. Thomas' server is using less than 2 gigs of RES RAM, while I'm at 11 gig. It might be the plyvel verison. I've updated to 0.9 (from 0.2) recently - still running leveldb 1.9.x (2013) with it. Thomas is using leveldb 1.9.x and plyvel 0.8. Which pyvel versions are you using?

For now I get a stable running server with 16 GB RAM + 16 GB swap. Around 6 GB swap gets used so I can recommend setting 24 Gigs of RAM + swap.

EagleTM commented 8 years ago

Sorry, no progress here currently. The RAM recommendations still stand. I've put them into the HOWTO for now

spesmilo / electrum-server

server stops suddenly / std::bad_alloc memory exhaustion #126