gurnec / btcrecover

An open source Bitcoin wallet password and seed recovery tool designed for the case where you already know most of your password/seed, but need assistance in trying different possible combinations.
GNU General Public License v2.0
1.27k stars 677 forks source link

GPU accleration error: clEnqueueNDRangeKernel failed: mem object allocation failure #61

Open TTN- opened 7 years ago

TTN- commented 7 years ago

Hi all, I'm not sure what the problem is but the following crash happens when attempting to run with gpu acceleration.

My GPU is a GTX1060 with working CUDA drivers on ubuntu 16.04, I don't know about OpenCL drivers though clinfo does appear to spit out useful information which can be found in here: https://pastebin.com/KGh6ygMy


Please enter the data from the extract script
> #redacted
(ERROR) ArmoryUtils.py:1174 - Error getting system details:
Traceback (most recent call last):
  File "/data/home-data/Projects/project_rooster/armoryengine/ArmoryUtils.py", line 1172, in <module>
    SystemSpecs = GetSystemDetails()
  File "/data/home-data/Projects/project_rooster/armoryengine/ArmoryUtils.py", line 1167, in GetSystemDetails
    out.HddAvailB = getHddSize(BTC_HOME_DIR)    / (1024**3)
  File "/data/home-data/Projects/project_rooster/armoryengine/ArmoryUtils.py", line 1164, in getHddSize
    s = os.statvfs(adir)
OSError: [Errno 2] No such file or directory: '/home/user/.bitcoin/'
(ERROR) ArmoryUtils.py:1175 - Skipping.
(ERROR) ArmoryUtils.py:3714 - Failed to import torrent downloader
Traceback (most recent call last):
  File "/data/home-data/Projects/project_rooster/armoryengine/ArmoryUtils.py", line 3711, in <module>
    import torrentDL
  File "/data/home-data/Projects/project_rooster/armoryengine/torrentDL.py", line 18, in <module>
    from BitTornado.download_bt1 import BT1Download, defaults, get_response
ImportError: No module named download_bt1
WARNING: an Armory private key, once decrypted, provides access to that key's Bitcoin
(ERROR) Traceback (most recent call last):
  File "btcrecover.py", line 36, in <module>
    (password_found, not_found_msg) = btcrpass.main()
  File "/data/home-data/Projects/project_rooster/btcrecover/btcrpass.py", line 5117, in main
    itertools.islice(itertools.ifilter(custom_final_checker, performance_generator), inner_iterations)))
  File "/data/home-data/Projects/project_rooster/btcrecover/btcrpass.py", line 443, in return_verified_password_or_false
    return self._return_verified_password_or_false_opencl(passwords) if hasattr(self, "_cl_devices") \
  File "/data/home-data/Projects/project_rooster/btcrecover/btcrpass.py", line 588, in _return_verified_password_or_false_opencl
    v_start, self._v_len_chunksize, self._cl_hashes_buffers[devnum], 0 == v_start == i))
  File "/usr/lib/python2.7/dist-packages/pyopencl/__init__.py", line 512, in kernel_call
    global_offset, wait_for, g_times_l=g_times_l)
MemoryError: clEnqueueNDRangeKernel failed: mem object allocation failure

Traceback (most recent call last):
  File "btcrecover.py", line 36, in <module>
    (password_found, not_found_msg) = btcrpass.main()
  File "/data/home-data/Projects/project_rooster/btcrecover/btcrpass.py", line 5117, in main
    itertools.islice(itertools.ifilter(custom_final_checker, performance_generator), inner_iterations)))
  File "/data/home-data/Projects/project_rooster/btcrecover/btcrpass.py", line 443, in return_verified_password_or_false
    return self._return_verified_password_or_false_opencl(passwords) if hasattr(self, "_cl_devices") \
  File "/data/home-data/Projects/project_rooster/btcrecover/btcrpass.py", line 588, in _return_verified_password_or_false_opencl
    v_start, self._v_len_chunksize, self._cl_hashes_buffers[devnum], 0 == v_start == i))
  File "/usr/lib/python2.7/dist-packages/pyopencl/__init__.py", line 512, in kernel_call
    global_offset, wait_for, g_times_l=g_times_l)
pyopencl.MemoryError: clEnqueueNDRangeKernel failed: mem object allocation failure```
gurnec commented 7 years ago

This might be an out-of-memory issue with the particular command-line parameters you're starting btcrecover with (and the particular wallet file you're using).

What is the full command line you're passing to btcrecover? What output do you get when you add --calc-memory to that command line?

TTN- commented 7 years ago

The complete command I used is python btcrecover.py --tokenlist tokens_noSs.txt --wallet armory_S4bm2cp7_encrypt.wallet --max-eta 999999 --enable-gpu --global-ws 256 --local-ws 128 -d --no-eta

With arguments --global-ws --local-ws removed and replaced with --mem-calc outputs:

  ROMix V-table length:  262,144
  outer iteration count: 3
  with --mem-factor 1 (the default),
    memory per global worker: 16,384 KB

Details for GeForce GTX 1060 6GB
  global memory size:     6,071 MB
  with --mem-factor 1 (the default),
    est. max --global-ws: 352
    with --global-ws 4096 (the default),
      est. memory usage:  65,536 MB

Nvidia propietary app reports 10% of video memory used, though as above, est. max --global-ws is very small: 352MB. Very small considering there's 6GB on that card.

I'm able to get it running with parameters --global-ws 256 --local-ws 128

^C1024  elapsed: 0:01:08  rate:  14.91  P/s                                     
Interrupted after finishing password # 1024

Not blisteringly fast, only 2x as fast as my cpu (ryzen 5 1600)

gurnec commented 7 years ago

This is mostly on par with what I'd expect. Try not to expect blistering fast performance with Armory wallets (even with your nice GPU) - the KDF etotheipi chose for Armory wallets was specifically chosen to hinder brute-force attacks using GPUs.

That said, there's probably a little room for improvement.

The report says that each global worker requires 16M VRAM. You specified 256 global workers (the --global-ws parameter), so that uses a total of 4.25G VRAM, a good chunk of it. The report also says the max global workers you can probably specify is around 352. I'd start off by trying --global-ws 352 --local-ws 32 (uses 5.5G) and also --global-ws 320 --local-ws 64 (uses 5G) to see if either is any better.

Next would be to play around with the --mem-factor setting. The default is 1. Every time you double the --mem-factor, you can double the --global-ws at the same time. This should improve performance up to a point, after which it won't really help any further. I'd start off with multiplying each by 8, and then trying other multipliers (4, 16, etc.) in each direction until performance doesn't improve.

FWIW I've been in your shoes before, and I know it's a painful process.... If you do try to work through this, could you let me know what the results were (the settings, the final P/s)? Thanks!

TTN- commented 7 years ago

Thanks for the explanation! I have tried various combinations. The fastest method I found to be setting the workspace to 320 and the ws to 64 with a mem factor of 1. Any more just slows it down. Managed to get up to 18 passwords a second on the GTX 1060 6GB

To compare the Ryzen 5 1600 does 15 a second. Armory certainly is a well hardened wallet against GPU attacks thats for sure.

gurnec commented 7 years ago

Armory certainly is a well hardened wallet against GPU

Agreed. Thanks for details.

TTN- commented 6 years ago

I'm revisiting cracking my old password. Bitcoin is up 7x in value since I last spent time cracking at this so its time to give it another shot.

I've made a table of what speeds I'm able to obtain.

image

36 passwords a second. A considerable improvement. This is on windows 10 x64. Earlier test results were on ubuntu 16.04 but I didn't test a wide range like I have now. All the same hardware still.

gurnec commented 6 years ago

That's certainly better... when you run btcrecover, what does it tell you your wallet difficulty is?

Also FYI there's an update that's coming in a couple of days that should give another few percent speed boost, I'll post back here when it's ready & uploaded.

TTN- commented 6 years ago

Yes:

Wallet difficulty: 16 MiB, 3 iterations + ECC

Even a few points of speed increase would be fantastic. I've installed a old gpu to drive my display, because 5 seconds of input delay isn't that fun :-) then my GTX 1060 6GB is be dedicated to just cracking.

It now does 35.6 password/s even when that graphics card is not driving the display. The performance appears to be very memory bound and goes up in very incremented steps. I'll see how far I can push the memory now.

TTN- commented 6 years ago

It really doesn't like going over --global-ws 2048 Using 2560 as globalws and 128 for local, it still crashes. Or even 2304 as global, 128 local, 1x mem factor, no go:

pyopencl.cffi_cl.MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE

This is a 6GB card btw.