spacemeshos / gpu-post

Spacemesh proof of space time gpu optimized setup
GNU General Public License v3.0
29 stars 9 forks source link

Unexpected number of computed hashes (CUDA) #42

Closed avive closed 3 years ago

avive commented 3 years ago

Using a CUDA provider, when the library is used to find labels and a pow solution via the options, and a pow solution is found in a library call while also the lib computes the leaves, then the returned total hashes computed by the lib is 0 and not the number of requested leaves. Reproduced with this release: https://github.com/spacemeshos/gpu-post/actions/runs/863441063

@moshababo - fyi - I see this on CUDA only - not on CPU or Vulkan providers. It might be the issue you reported before. As a band-aid until this is fix - set the POW D difficulty param so additional computations just for POW will be needed after the leaves computation.

avive commented 3 years ago

@AndrewAR2 I just reproduced this with the most recent build: https://github.com/spacemeshos/gpu-post/actions/runs/874060571 .

You can reproduce by running /home/avive/pos-server/make test on the GC instance w the nvidia gpus.

AndrewAR2 commented 3 years ago

Please tell me how can I reproduce this bug?

avive commented 3 years ago

Run /home/avive/pos-server/make test on the GC instance with the 2 nvidia gpus.

avive commented 3 years ago

There's a check in the code after each iteration that the number of hashes computed is equal to the number of hashes requested. In the case that a POW solution is found in the iteration - 0 is set as the number of computed hashes in this iteration.

AndrewAR2 commented 3 years ago

I don't have access to /home/avive/.cargo and .rustup

avive commented 3 years ago

I think you'll need to clone and maybe install rust for your user like

  1. git clone https://github.com/spacemeshos/pos-server.git
  2. Install Rust
  3. Put the latest .so lib w cuda support in the pos-compute create resources folder
  4. make test
AndrewAR2 commented 3 years ago

This error is due to an old version of libgpu-setup.so in /lib that does not match the current API.

avive commented 3 years ago

Great news. q: is it working as expected after updating the lib on that test box?

AndrewAR2 commented 3 years ago

Yes, it is working as expected after removing the old version of the library from /lib.

avive commented 3 years ago

Confirmed.