ifdefelse / ProgPOW

A Programmatic Proof-of-Work for Ethash. Forked from https://github.com/ethereum-mining/ethminer
GNU General Public License v3.0
257 stars 84 forks source link

CUDA Kernel produces 1.3~1.7 invalid shares on concurrent kernels #22

Closed AndreaLanfranchi closed 5 years ago

AndreaLanfranchi commented 5 years ago

Test Environment Linux Ubuntu 16.04 CUDA 10 NVIDIA Driver 410.79 6x Gtx 1050 Ti

Software https://github.com/AndreaLanfranchi/ethminer with 0.9.2 implemented

Running the GPUs on CUDA I get 33 to 35.5 Mh (overall) but keep getting 1.3 to 1.7 invalid shares (2 hours test batch) Same test running the same GPUs on OpenCL I get 30 to 32 Mh but 0% invalid shares. (same 2 hours batch)

For sake of precision I am mining on a private node linked to gangnam network and with minimum diff of 430 Mh (nicehash index 0.1).

Am I missing something ?

hackmod commented 5 years ago

I've tested your kernel and observe some invalid share (GPU 0 gave incorrect result. Lower overclocking values if it happens frequently.)

but with my latest rebased work (based on ifdefelse's reference work. please see https://github.com/ifdefelse/ProgPOW/issues/17 or https://github.com/ethereum-mining/ethminer/pull/1766) no invalid share observed for some hours.

AndreaLanfranchi commented 5 years ago

Thanks hackmod ... but your implementation is ProgPoW only. Mine is mixed ethash/progpow

I believe there is some symbol which overlaps among kernels

ifdefelse commented 5 years ago

Our reference implementation is also ProgPoW-only. I'd think an overlapping symbol would cause a more consistent failure.

If you log the block number, header, nonce, and result of the failure can you reproduce it? The fact that OpenCL mining works indicates it's not an algorithm or GPU issue.

AndreaLanfranchi commented 5 years ago

Actually I can track down block number header and nonce: for result have to change slightly the kernel as it's not returned in the result structure ... it'll take a little.

Only (I believe) relevant difference in my implementation is that DAG is generated using "classic" ethash kernel.

AndreaLanfranchi commented 5 years ago

@ifdefelse

I report here the results of a 1 hour batch on the test rig described above. Workprovider is a local geth node in sync with gangnam test net.

 Block        : 150,194
 Header       : 0x54d4acc043e26aef4d5058f2e17cfb7c6a61a8bbeb840fcae0a2c29236d0aecc
 Boundary     : 0x00000009608c69d5693877f008f57b9ca1c81d66c21ebe31419416af9b8faab5
 Nonce        : 0x3f9ba7b85ae38c75
 From GPU     : 0x0000000347baac2b
 From CPU     : 0xbbeaf6959ab793bd19fad208bda17d176eb752f67d481347d9a48d0de6fb96a7

 Block        : 150,213
 Header       : 0xbb6a55f7f6105e15fef782278ef5b38435d1fc93db50dec632efc9e6e73388d1
 Boundary     : 0x00000009573192d60cda98190aa305015379bf2ed6018da2972cbba7ea9fede8
 Nonce        : 0x4b1e57b661ab6e5d
 From GPU     : 0x0000000259c8fff3
 From CPU     : 0x9d10dd29f2131f78a430e6413f660714a3489abdc4ec39653d5dff5b79d24f16

 Block        : 150,222
 Header       : 0x26293c8a7ca191422a63d0482b100c64321c112432f51dccbb2fa78fd6890017
 Boundary     : 0x000000095e370ee5f520ba622b2e9fbb489aeeceb8bc9d7ae0ec565be074723f
 Nonce        : 0x86c0dc4ee43258db
 From GPU     : 0x000000053a66fef7
 From CPU     : 0x4d332efb748b2dfa0e26792f0fe05c98fd4b60794ac1a2db25ce0b400e251b95

 Block        : 150,392
 Header       : 0xbc490e700adc7a5eec52497716f03a3ca4bade4decdc25151449d103b5f543d4
 Boundary     : 0x000000094a7c496170a7a58e9c454834b55fa263d4d3fc5e2090e16d30afcf9b
 Nonce        : 0x4e7d9d8d1633eb76
 From GPU     : 0x000000019c72872c
 From CPU     : 0xb5dd24b1d526939c0e19643477d6cb4d3fd0df86f3e4069fa2b6cb9f7f852017

 Block        : 150,407
 Header       : 0x061c3e604372f069c49fe2f987d3b481fe850543d87f1da8bb71eebc15e955d7
 Boundary     : 0x000000094dfb90cac5f4c53238c3c09d81065dc302114ebe15f45191e65eb4cd
 Nonce        : 0x4fc210f2bb498bc0
 From GPU     : 0x0000000607380634
 From CPU     : 0x5fcf766c9ca760d2b598295e44c684802590eb807694246adc61bd41c02a9718

 Block        : 150,426
 Header       : 0x264c85189949edf5ddc04769617f81fe14d4c4db9de31a5097c1693bd59cd796
 Boundary     : 0x000000094cd4ba1bb253a4a91c57ed279c24b794a95c2419eec963639c95efb2
 Nonce        : 0x3742fcb348196e4c
 From GPU     : 0x0000000112857af3
 From CPU     : 0x6f329b5d6f115d661cd2f8a87eb803770a7333b2e4f2be563e88584023a595cf

The result from GPU is only upper 64bits Apparently the GPU is right in returning result which is below target nevertheless CPU re-evaluation of same nonce produces a completely different result.

What drives me mad is that those results are completely random Note :

AndreaLanfranchi commented 5 years ago

Just for info after a 14 hours batch without interruptions the ratio of invalid shares is stable around 1.6%

 m 12:20:36 ethminer 14:12 A3205:F51 31.14 Mh { cu0 5.21 52C 70% | cu1 5.21 49C 70% | cu2 5.19 57C 70% | cu3 5.17 60C 70% | cu4 5.18 61C 70% | cu5 5.18 60C 70% }

I cant' find any relevant data:

AndreaLanfranchi commented 5 years ago

@ifdefelse that's interesting.

Running CUDA (same environment as above) on only 1 stream (so removing kernel concurrency) and the problem automagically disappears. Actually that is the most relevant difference between CUDA and OpenCL (the latter runs only with one queue) while the first uses by default 2 streams (increasable)

ifdefelse commented 5 years ago

You've restructured the order of some of the cuda calls, right? Sounds like there might be a race condition?

AndreaLanfranchi commented 5 years ago

Actually no. I didn't change the order of any call.

AndreaLanfranchi commented 5 years ago

Only relevant difference (I can think of) among implementations is that streams in ethminer are created with cudaStreamNonBlocking option so they do not sync with stream 0 (default). (on your implementations instead they're created to sync with default stream).

AndreaLanfranchi commented 5 years ago

More feed back. I managed to have streams created without non blocking option but still same results.

More than one stream creates invalids. One stream only rock solid.

If you could test/review my implementation would be highly appreciated.

ifdefelse commented 5 years ago

Will do, but it might take me a few days to get a chance. However I notice you have some general miner and ethash changes mixed along with the progpow changes. Is there a PR with just the progpow changes to review?

AndreaLanfranchi commented 5 years ago

Unfortunately not. But I will post here all the commits involving ProgPoW. Thanks for your support.

hackmod commented 5 years ago

git bisect result indicates that after commit https://github.com/AndreaLanfranchi/ethminer/commit/a4e74a0518a invalid share observed.

AndreaLanfranchi commented 5 years ago

@ifdefelse I believe there might be error in your implementation logic for concurrent streams which may explain why I experience invalids while you don't.

For instance : your logic appear to have always only 1 stream running thus no kernel concurrency.

When you enter first time here https://github.com/ifdefelse/ProgPOW/blob/master/libethash-cuda/CUDAMiner.cpp#L605-L609 you have m_current_index == 0 and immediately increment to 1 which, as a consequence, evaluates stream_index == 1

Being in first loop the test

if (m_current_index >= s_numStreams)

fails and you skip this https://github.com/ifdefelse/ProgPOW/blob/master/libethash-cuda/CUDAMiner.cpp#L616-L630 jumping to kernel launch at https://github.com/ifdefelse/ProgPOW/blob/master/libethash-cuda/CUDAMiner.cpp#L631-L638

No other statement gets executed and we jump to the beginning of the loop where m_current_index gets incremented again resulting now m_current_index == 2 and stream_index == 0

As a consequence the test at https://github.com/ifdefelse/ProgPOW/blob/master/libethash-cuda/CUDAMiner.cpp#L616 succeeds and the loop waits for return from

CUDA_SAFE_CALL(cudaStreamSynchronize(stream));

And here is the problem Your streams are created without cudaStreamNonBlocking so even if we're trying to sync a stream which hasn't any running kernel it nevertheless syncs with default null stream which, eventually, waits for completion of all other streams which (according to CUDA manual) do not have option cudaStreamNonBlocking set : for those streams, the sync with null stream is implicit. In other words : syncing any "standard" stream is practically the same as calling cudaDeviceSynchronize

Due to this the 2nd kernel is not launched unless the first have finished.

Unless I miss something huge this loop never have two (or more) concurrent kernels, and the optional setting of more than 1 stream does not produce any effect.

AndreaLanfranchi commented 5 years ago

In my implementation, at the very beginning of the search, I launch immediately all the stream kernels (without any device or stream sync among them) so I really have kernel concurrency. https://github.com/AndreaLanfranchi/ethminer/blob/master/libethash-cuda/CUDAMiner.cpp#L508-L524

And all following stream syncronizations (being cudaStreamNonBlocking) wait only for the work of the specific stream to complete without implicit syncronization with null stream.

AndreaLanfranchi commented 5 years ago

More feed back: I just cloned your repo and modified this line https://github.com/ifdefelse/ProgPOW/blob/824cd791634204c4cc7e31f84bb76c0c84895bd3/libethash-cuda/CUDAMiner.cpp#L406

From

CUDA_SAFE_CALL(cudaStreamCreate(&m_streams[i]));

To

CUDA_SAFE_CALL(cudaStreamCreateWithFlags(&m_streams[i], cudaStreamNonBlocking));

Built and run ... invalids appear immediately.

AndreaLanfranchi commented 5 years ago

Here is the log of a test run on your implementation with the cudaStreamNonBlocking flag set. 100% invalids.

  m  16:58:15|ethminer|  ethminer version 0.15.0.dev0
  m  16:58:15|ethminer|  Build: linux / release +git. 824cd79
 cu  16:58:15|ethminer|  Using grid size 1024 , block size 512
  m  16:58:15|ethminer|  not-connected
  ℹ  16:58:15|getwork |  Connected to 10.0.0.113:8545
  ℹ  16:58:15|getwork |  Spinning up miners...
  ℹ  16:58:15|cuda-0  |  No work.
  ℹ  16:58:15|cuda-0  |  No work.
[ ... ]
  ℹ  16:58:15|cuda-0  |  No work.
  ℹ  16:58:15|cuda-3  |  No work.
  ℹ  16:58:15|cuda-0  |  Initialising miner 0
  ℹ  16:58:15|cuda-3  |  Initialising miner 3
  ℹ  16:58:15|cuda-2  |  No work.
  ℹ  16:58:15|cuda-2  |  Initialising miner 2
  ℹ  16:58:15|cuda-5  |  No work.
  ℹ  16:58:15|cuda-1  |  No work.
  ℹ  16:58:15|cuda-1  |  Initialising miner 1
  ℹ  16:58:15|cuda-5  |  Initialising miner 5
  ℹ  16:58:15|getwork |  New pool difficulty:  740.8907 megahashes 
  ℹ  16:58:15|cuda-4  |  No work.
  ℹ  16:58:15|getwork |  Received new job #e0696191… from 10.0.0.113
  ℹ  16:58:15|cuda-4  |  Initialising miner 4
 cu  16:58:16|cuda-0  |  Using device: GeForce GTX 1050 Ti  (Compute 6.1)
 cu  16:58:16|cuda-0  |  Set Device to current
 cu  16:58:16|cuda-0  |  Resetting device
 cu  16:58:16|cuda-3  |  Using device: GeForce GTX 1050 Ti  (Compute 6.1)
 cu  16:58:16|cuda-3  |  Set Device to current
 cu  16:58:16|cuda-3  |  Resetting device
 cu  16:58:16|cuda-1  |  Using device: GeForce GTX 1050 Ti  (Compute 6.1)
 cu  16:58:16|cuda-1  |  Set Device to current
 cu  16:58:16|cuda-1  |  Resetting device
 cu  16:58:16|cuda-2  |  Using device: GeForce GTX 1050 Ti  (Compute 6.1)
 cu  16:58:16|cuda-2  |  Set Device to current
 cu  16:58:16|cuda-2  |  Resetting device
 cu  16:58:16|cuda-5  |  Using device: GeForce GTX 1050 Ti  (Compute 6.1)
 cu  16:58:16|cuda-5  |  Set Device to current
 cu  16:58:16|cuda-5  |  Resetting device
 cu  16:58:16|cuda-4  |  Using device: GeForce GTX 1050 Ti  (Compute 6.1)
 cu  16:58:16|cuda-4  |  Set Device to current
 cu  16:58:16|cuda-4  |  Resetting device
 cu  16:58:18|cuda-3  |  Allocating light with size: 17432512
 cu  16:58:18|cuda-1  |  Allocating light with size: 17432512
 cu  16:58:18|cuda-2  |  Allocating light with size: 17432512
 cu  16:58:18|cuda-0  |  Allocating light with size: 17432512
 cu  16:58:18|cuda-4  |  Allocating light with size: 17432512
 cu  16:58:18|cuda-5  |  Allocating light with size: 17432512
 cu  16:58:18|cuda-0  |  Generating mining buffers
 cu  16:58:18|cuda-3  |  Generating mining buffers
 cu  16:58:18|cuda-4  |  Generating mining buffers
 cu  16:58:18|cuda-2  |  Generating mining buffers
 cu  16:58:18|cuda-1  |  Generating mining buffers
 cu  16:58:18|cuda-5  |  Generating mining buffers
 cu  16:58:18|cuda-0  |  Generating DAG for GPU # 0  with dagBytes: 1115684224  gridSize: 1024
 cu  16:58:18|cuda-4  |  Generating DAG for GPU # 4  with dagBytes: 1115684224  gridSize: 1024
 cu  16:58:18|cuda-3  |  Generating DAG for GPU # 3  with dagBytes: 1115684224  gridSize: 1024
 cu  16:58:18|cuda-2  |  Generating DAG for GPU # 2  with dagBytes: 1115684224  gridSize: 1024
 cu  16:58:18|cuda-5  |  Generating DAG for GPU # 5  with dagBytes: 1115684224  gridSize: 1024
 cu  16:58:18|cuda-1  |  Generating DAG for GPU # 1  with dagBytes: 1115684224  gridSize: 1024
  m  16:58:20|ethminer|  Speed   0.00 Mh/s    gpu/0  0.00  gpu/1  0.00  gpu/2  0.00  gpu/3  0.00  gpu/4  0.00  gpu/5  0.00  [A0+0:R0+0:F0] Time: 00:00
  ℹ  16:58:21|getwork |  Received new job #9aa1366c… from 10.0.0.113
 cu  16:58:24|cuda-0  |  Finished DAG
 cu  16:58:24|cuda-2  |  Finished DAG
 cu  16:58:24|cuda-5  |  Finished DAG
 cu  16:58:24|cuda-1  |  Finished DAG
 cu  16:58:24|cuda-4  |  Finished DAG
 cu  16:58:24|cuda-3  |  Finished DAG
 cu  16:58:25|cuda-0  |  Compile log: 
 cu  16:58:25|cuda-0  |  JIT info: 
 ptxas info    : 0 bytes gmem, 96 bytes cmem[3]
ptxas info    : Compiling entry function '_Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb' for 'sm_61'
ptxas info    : Function properties for _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 64 registers, 16384 bytes smem, 385 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Function properties for _Z11keccak_f8008hash32_tyS_
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
 cu  16:58:25|cuda-0  |  JIT err: 

 cu  16:58:25|cuda-0  |  Mangled name: _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
 cu  16:58:25|cuda-0  |  done compiling
  m  16:58:25|ethminer|  Speed   0.00 Mh/s    gpu/0  0.00  gpu/1  0.00  gpu/2  0.00  gpu/3  0.00  gpu/4  0.00  gpu/5  0.00  [A0+0:R0+0:F0] Time: 00:00
 cu  16:58:25|cuda-2  |  Compile log: 
 cu  16:58:25|cuda-1  |  Compile log: 
 cu  16:58:25|cuda-5  |  Compile log: 
 cu  16:58:26|cuda-4  |  Compile log: 
 cu  16:58:26|cuda-3  |  Compile log: 
 cu  16:58:26|cuda-1  |  JIT info: 
 ptxas info    : 0 bytes gmem, 96 bytes cmem[3]
ptxas info    : Compiling entry function '_Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb' for 'sm_61'
ptxas info    : Function properties for _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 64 registers, 16384 bytes smem, 385 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Function properties for _Z11keccak_f8008hash32_tyS_
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
 cu  16:58:26|cuda-1  |  JIT err: 

 cu  16:58:26|cuda-5  |  JIT info: 
 ptxas info    : 0 bytes gmem, 96 bytes cmem[3]
ptxas info    : Compiling entry function '_Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb' for 'sm_61'
ptxas info    : Function properties for _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 64 registers, 16384 bytes smem, 385 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Function properties for _Z11keccak_f8008hash32_tyS_
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
 cu  16:58:26|cuda-1  |  Mangled name: _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
 cu  16:58:26|cuda-5  |  JIT err: 

 cu  16:58:26|cuda-5  |  Mangled name: _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
 cu  16:58:26|cuda-5  |  done compiling
 cu  16:58:26|cuda-4  |  JIT info: 
 ptxas info    : 0 bytes gmem, 96 bytes cmem[3]
ptxas info    : Compiling entry function '_Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb' for 'sm_61'
ptxas info    : Function properties for _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 64 registers, 16384 bytes smem, 385 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Function properties for _Z11keccak_f8008hash32_tyS_
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
 cu  16:58:26|cuda-1  |  done compiling
 cu  16:58:26|cuda-4  |  JIT err: 

 cu  16:58:26|cuda-4  |  Mangled name: _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
 cu  16:58:26|cuda-2  |  JIT info: 
 ptxas info    : 0 bytes gmem, 96 bytes cmem[3]
ptxas info    : Compiling entry function '_Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb' for 'sm_61'
ptxas info    : Function properties for _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 64 registers, 16384 bytes smem, 385 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Function properties for _Z11keccak_f8008hash32_tyS_
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
 cu  16:58:26|cuda-4  |  done compiling
 cu  16:58:26|cuda-2  |  JIT err: 

 cu  16:58:26|cuda-2  |  Mangled name: _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
 cu  16:58:26|cuda-2  |  done compiling
 cu  16:58:26|cuda-3  |  JIT info: 
 ptxas info    : 0 bytes gmem, 96 bytes cmem[3]
ptxas info    : Compiling entry function '_Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb' for 'sm_61'
ptxas info    : Function properties for _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 64 registers, 16384 bytes smem, 385 bytes cmem[0], 4 bytes cmem[2]
ptxas info    : Function properties for _Z11keccak_f8008hash32_tyS_
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
 cu  16:58:26|cuda-3  |  JIT err: 

 cu  16:58:26|cuda-3  |  Mangled name: _Z14progpow_searchy8hash32_tyPK5dag_tPV14search_resultsb
 cu  16:58:26|cuda-3  |  done compiling
  ✘  16:58:30|cuda-1  |  GPU gave incorrect result!
  m  16:58:30|ethminer|  Speed  24.38 Mh/s    gpu/0  4.72  gpu/1  3.93  gpu/2  3.93  gpu/3  3.93  gpu/4  3.93  gpu/5  3.93  [A0+0:R0+0:F1] Time: 00:00
  ℹ  16:58:33|getwork |  Received new job #8de10aec… from 10.0.0.113
  m  16:58:35|ethminer|  Speed  28.49 Mh/s    gpu/0  5.01  gpu/1  4.72  gpu/2  4.72  gpu/3  4.66  gpu/4  4.66  gpu/5  4.72  [A0+0:R0+0:F1] Time: 00:00
  m  16:58:40|ethminer|  Speed  32.09 Mh/s    gpu/0  5.35  gpu/1  5.40  gpu/2  5.35  gpu/3  5.30  gpu/4  5.35  gpu/5  5.35  [A0+0:R0+0:F1] Time: 00:00
  ℹ  16:58:42|getwork |  Received new job #7b36845f… from 10.0.0.113
  m  16:58:45|ethminer|  Speed  32.03 Mh/s    gpu/0  5.39  gpu/1  5.34  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F1] Time: 00:00
  m  16:58:50|ethminer|  Speed  31.98 Mh/s    gpu/0  5.34  gpu/1  5.39  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.29  [A0+0:R0+0:F1] Time: 00:00
  ✘  16:58:50|cuda-3  |  GPU gave incorrect result!
  ✘  16:58:51|cuda-3  |  GPU gave incorrect result!
  ℹ  16:58:54|getwork |  Received new job #ec20a51d… from 10.0.0.113
  m  16:58:55|ethminer|  Speed  32.08 Mh/s    gpu/0  5.39  gpu/1  5.39  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F3] Time: 00:00
  m  16:59:00|ethminer|  Speed  32.03 Mh/s    gpu/0  5.39  gpu/1  5.34  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F3] Time: 00:00
  ℹ  16:59:03|getwork |  Received new job #12c5beb2… from 10.0.0.113
  m  16:59:05|ethminer|  Speed  31.73 Mh/s    gpu/0  5.32  gpu/1  5.32  gpu/2  5.27  gpu/3  5.27  gpu/4  5.27  gpu/5  5.27  [A0+0:R0+0:F3] Time: 00:00
  m  16:59:10|ethminer|  Speed  32.03 Mh/s    gpu/0  5.39  gpu/1  5.34  gpu/2  5.34  gpu/3  5.34  gpu/4  5.29  gpu/5  5.34  [A0+0:R0+0:F3] Time: 00:00
  ℹ  16:59:12|getwork |  Received new job #bc1ad797… from 10.0.0.113
  m  16:59:15|ethminer|  Speed  32.13 Mh/s    gpu/0  5.39  gpu/1  5.39  gpu/2  5.34  gpu/3  5.34  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F3] Time: 00:01
  m  16:59:20|ethminer|  Speed  32.08 Mh/s    gpu/0  5.34  gpu/1  5.39  gpu/2  5.34  gpu/3  5.34  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F3] Time: 00:01
  ℹ  16:59:24|getwork |  Received new job #2450b2c7… from 10.0.0.113
  m  16:59:25|ethminer|  Speed  32.03 Mh/s    gpu/0  5.39  gpu/1  5.34  gpu/2  5.34  gpu/3  5.34  gpu/4  5.29  gpu/5  5.34  [A0+0:R0+0:F3] Time: 00:01
  m  16:59:30|ethminer|  Speed  32.03 Mh/s    gpu/0  5.34  gpu/1  5.39  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F3] Time: 00:01
  ℹ  16:59:33|getwork |  Received new job #8aada8e3… from 10.0.0.113
  m  16:59:35|ethminer|  Speed  31.76 Mh/s    gpu/0  5.29  gpu/1  5.34  gpu/2  5.29  gpu/3  5.24  gpu/4  5.29  gpu/5  5.29  [A0+0:R0+0:F3] Time: 00:01
  ✘  16:59:39|cuda-3  |  GPU gave incorrect result!
  m  16:59:40|ethminer|  Speed  31.98 Mh/s    gpu/0  5.39  gpu/1  5.34  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.29  [A0+0:R0+0:F4] Time: 00:01
  m  16:59:45|ethminer|  Speed  31.99 Mh/s    gpu/0  5.34  gpu/1  5.39  gpu/2  5.34  gpu/3  5.29  gpu/4  5.29  gpu/5  5.34  [A0+0:R0+0:F4] Time: 00:01
  ℹ  16:59:45|getwork |  Received new job #bd704477… from 10.0.0.113
  m  16:59:50|ethminer|  Speed  32.03 Mh/s    gpu/0  5.39  gpu/1  5.34  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F4] Time: 00:01
  ℹ  16:59:54|getwork |  Received new job #01dbb003… from 10.0.0.113
  m  16:59:55|ethminer|  Speed  31.67 Mh/s    gpu/0  5.35  gpu/1  5.30  gpu/2  5.24  gpu/3  5.24  gpu/4  5.30  gpu/5  5.24  [A0+0:R0+0:F4] Time: 00:01
  m  17:00:00|ethminer|  Speed  32.03 Mh/s    gpu/0  5.34  gpu/1  5.39  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F4] Time: 00:01
  ℹ  17:00:03|getwork |  Received new job #d8f4bc20… from 10.0.0.113
  m  17:00:05|ethminer|  Speed  31.63 Mh/s    gpu/0  5.31  gpu/1  5.31  gpu/2  5.26  gpu/3  5.21  gpu/4  5.26  gpu/5  5.26  [A0+0:R0+0:F4] Time: 00:01
  m  17:00:10|ethminer|  Speed  32.13 Mh/s    gpu/0  5.39  gpu/1  5.39  gpu/2  5.34  gpu/3  5.34  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F4] Time: 00:01
  ✘  17:00:15|cuda-2  |  GPU gave incorrect result!
  m  17:00:15|ethminer|  Speed  32.11 Mh/s    gpu/0  5.36  gpu/1  5.41  gpu/2  5.36  gpu/3  5.31  gpu/4  5.31  gpu/5  5.36  [A0+0:R0+0:F5] Time: 00:02
  ℹ  17:00:15|getwork |  Received new job #0da09b32… from 10.0.0.113
  m  17:00:20|ethminer|  Speed  32.08 Mh/s    gpu/0  5.39  gpu/1  5.34  gpu/2  5.34  gpu/3  5.34  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F5] Time: 00:02
  ℹ  17:00:24|getwork |  Received new job #1199cc91… from 10.0.0.113
  m  17:00:25|ethminer|  Speed  31.57 Mh/s    gpu/0  5.30  gpu/1  5.30  gpu/2  5.24  gpu/3  5.24  gpu/4  5.24  gpu/5  5.24  [A0+0:R0+0:F5] Time: 00:02
  m  17:00:30|ethminer|  Speed  31.98 Mh/s    gpu/0  5.39  gpu/1  5.34  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.29  [A0+0:R0+0:F5] Time: 00:02
  ✘  17:00:32|cuda-0  |  GPU gave incorrect result!
  ℹ  17:00:33|getwork |  Received new job #54e86a1b… from 10.0.0.113
  m  17:00:35|ethminer|  Speed  31.76 Mh/s    gpu/0  5.34  gpu/1  5.34  gpu/2  5.28  gpu/3  5.23  gpu/4  5.28  gpu/5  5.28  [A0+0:R0+0:F6] Time: 00:02
  m  17:00:40|ethminer|  Speed  32.03 Mh/s    gpu/0  5.34  gpu/1  5.39  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F6] Time: 00:02
  m  17:00:45|ethminer|  Speed  31.95 Mh/s    gpu/0  5.38  gpu/1  5.33  gpu/2  5.33  gpu/3  5.29  gpu/4  5.29  gpu/5  5.33  [A0+0:R0+0:F6] Time: 00:02
  ℹ  17:00:45|getwork |  Received new job #e648b278… from 10.0.0.113
  ✘  17:00:47|cuda-4  |  GPU gave incorrect result!
  m  17:00:50|ethminer|  Speed  32.03 Mh/s    gpu/0  5.39  gpu/1  5.39  gpu/2  5.34  gpu/3  5.29  gpu/4  5.34  gpu/5  5.29  [A0+0:R0+0:F7] Time: 00:02
  ✘  17:00:53|cuda-5  |  GPU gave incorrect result!
  ℹ  17:00:54|getwork |  Received new job #f9e290af… from 10.0.0.113
  m  17:00:55|ethminer|  Speed  31.73 Mh/s    gpu/0  5.30  gpu/1  5.30  gpu/2  5.30  gpu/3  5.24  gpu/4  5.30  gpu/5  5.30  [A0+0:R0+0:F8] Time: 00:02
  ✘  17:00:58|cuda-4  |  GPU gave incorrect result!
  m  17:01:00|ethminer|  Speed  31.98 Mh/s    gpu/0  5.39  gpu/1  5.34  gpu/2  5.29  gpu/3  5.29  gpu/4  5.34  gpu/5  5.34  [A0+0:R0+0:F9] Time: 00:02
  ✘  17:01:01|cuda-1  |  GPU gave incorrect result!
  m  17:01:05|ethminer|  Speed  32.03 Mh/s    gpu/0  5.34  gpu/1  5.39  gpu/2  5.34  gpu/3  5.34  gpu/4  5.29  gpu/5  5.34  [A0+0:R0+0:F10] Time: 00:02
  ℹ  17:01:06|getwork |  Received new job #5259b49c… from 10.0.0.113
  ℹ  17:01:07|ethminer|  Shutting down...
  ℹ  17:01:07|ethminer|  Disconnected from 10.0.0.113
  ℹ  17:01:07|ethminer|  Shutting down miners...
hackmod commented 5 years ago

@AndreaLanfranchi wrote: More feed back: I just cloned your repo and modified this line

ProgPOW/libethash-cuda/CUDAMiner.cpp

Line 406 in 824cd79

CUDA_SAFE_CALL(cudaStreamCreate(&m_streams[i])); From

CUDA_SAFE_CALL(cudaStreamCreate(&m_streams[i]));

To

CUDA_SAFE_CALL(cudaStreamCreateWithFlags(&m_streams[i], cudaStreamNonBlocking));

Built and run ... invalids appear immediately.

reference work didn't touch m_current_index https://github.com/ifdefelse/ProgPOW/blame/master/libethash-cuda/CUDAMiner.cpp#L605-L609

see also my rebased work (based on ifelsedef's work) https://github.com/hackmod/ethminer/blame/progpow-master-rebase-v0.9.2/libethash-cuda/CUDAMiner.cpp#L441-L457

in my rebased work CUDA_SAFE_CALL(cudaStreamCreateWithFlags(&m_streams[i], cudaStreamNonBlocking)); problem already solved by commit https://github.com/ethereum-mining/ethminer/commit/efc2e879db48d60c96bae270cd1767ff62c54d4c

hackmod commented 5 years ago

resolved by PR https://github.com/AndreaLanfranchi/ethminer/pull/44

ifdefelse commented 5 years ago

Excellent, thanks for the update! Closing this issue out.