stratum-mining / stratum

stratum
https://stratumprotocol.org
Other
221 stars 127 forks source link

`invalid-job-id` errors on `tProxy` don't result in rejection of shares #791

Closed plebhash closed 4 months ago

plebhash commented 8 months ago

I was trying a BitAxe Ultra (BM1366) and I noticed multiple errors like this:

ERROR translator_sv2::lib::proxy::bridge: Submit share error Ok("invalid-job-id")

It also happens with CPUminer.

It was reported here: https://github.com/stratum-mining/stratum/issues/788

as stated by @Fi3:

This happens because the miner keep sending shares the for the old job and the translator mark them as invalid

That explains the root cause of the error. Let's call these shares "delayed shares".

However, delayed shares are still accounted as "accepted" on the BitAxe side.

image

That is likely because the delayed shares only trigger an error event, but never really trigger some share rejection.

Fi3 commented 8 months ago

yep if I remember correctly the translator do not bother to send a response downstream for share error. I guess that this is mainly cause is supposed to be a proxy used by miners, so it is where metrics are collected and you use it as observable point. Said that returning errors for invalid shares would be a very nice thing to add. Also I don't know if the downstream need that message for stop sending shares and start working on the new job (don't think so but maybe is the case)

GitGab19 commented 8 months ago

Translator should be able to communicate to the miners the refused share whenever it's refused from upstream or it's not valid. What translator should do is to send downstream the following string: {"id": job_id, "result": false, "error": null} in those cases. Right now it always sends it with "result": true, I just checked.

@plebhash we can reinsert the label good-first-issue imo, what do you think?

plebhash commented 8 months ago

Cool, I'm putting the good-first-issue label back again.

We believe this is a good first issue because the fix is a relatively simple change, so it should provide some smooth intro for new contributors.

But it still requires that newcomers get familiar with SRI basics. A good preparation is to simply follow the Getting Started tutorial from https://stratumprotocol.org. It will help you deploy the different SRI roles and do some CPU mining on testnet.

A BitAxe is not a strict requirement for this issue, although it could be useful since the AxeOS UI provides a counter on accepted vs rejected shares, which was how I first identified this issue.

But the CPUminer logs should also provide some insight. For example:

[2024-03-14 08:51:38] accepted: 9/9 (100.00%), 31241 khash/s (yay!!!)
[2024-03-14 08:51:48] DEBUG: hash <= target
Hash:   000000069eee4d384c47fa4a355e704468a386c85b9cc8e6dcf5b24251723773
Target: 00000007fff80000000000000000000000000000000000000000000000000000
[2024-03-14 08:51:48] > {"method": "mining.submit", "params": ["", "33", "0000000000000000", "65f2e49a", "883c5acd"], "id":4}
[2024-03-14 08:51:48] < {"id":4,"error":null,"result":true}
[2024-03-14 08:51:48] accepted: 10/10 (100.00%), 31306 khash/s (yay!!!)

here, CPUminer believes 100% of the shares have been accepted while tProxy is actually showing invalid-job-id errors.

adoerr commented 8 months ago

I will give it a try. I'm going to start with working through the SRI tutorial and try to figure out how to hook up my BitAxe. This hopefully will allow me to reproduce the issue.

adoerr commented 7 months ago

What translator should do is to send downstream the following string: {"id": job_id, "result": false, "error": null} in those cases. Right now it always sends it with "result": true, I just checked.

I think I'm a bit stuck in terms of finding a way how to actually change result to true. The issue is that accepted is always true because handle_submit() always returns true.

handle_submit_shares() in turn only emits the invalid-job-id error message and doesn't return a useful error in this case either.

Any suggestions how to proceed without reworking error propagation itself :thinking:

Fi3 commented 7 months ago

share verification is not did at transloator::downstream level but at bridge level that is the one that have both sv2 and sv1 information (when you get a share if the share is above target you have to send it upstream). Downstream and bridge have differenet execution contexts. I guess that best thing to do, is to not send anything downs on share arrive as a transaltor::downstream, but send share responses from bridge to translator: https://github.com/stratum-mining/stratum/blob/95d89b29d596b1ba9d1596e0da9bf206e19a0187/roles/translator/src/lib/downstream_sv1/downstream.rs#L288 I think that you want to add a future to this select where you receive share responses

0xSaksham commented 7 months ago

I want to give this a shot.

adoerr commented 7 months ago

I want to give this a shot.

Please :+1: I got completely buried by other stuff :see_no_evil:

GitGab19 commented 7 months ago

@0xSaksham are you working on this?

0xSaksham commented 7 months ago

Hey, I'll need time. Please assign this to anyone else.

plebhash commented 6 months ago

@lorbax so when I suggested our informal MG workshop call, I think we could try to reproduce this issue as a guiding exercise

lorbax commented 6 months ago

@lorbax so when I suggested our informal MG workshop call, I think we could try to reproduce this issue as a guiding exercise

sure DM me on discord

plebhash commented 5 months ago

I'm adding a MG test against this bug via https://github.com/stratum-mining/stratum/pull/910

right now, the test is intentionally allowing the bug via commit bb9c643ab598c895fd7959be5c70e0ad834bc280, in order to avoid MG breaking CI on other PRs

on the future PR that fixes the bug described on this issue, we should also revert this commit

rockcoolsaint commented 4 months ago

Anyone working on this?

GitGab19 commented 4 months ago

Anyone working on this?

Yeah I'm pretty sure @plebhash already fixed it here: https://github.com/stratum-mining/stratum/pull/910

plebhash commented 4 months ago

Anyone working on this?

Yeah I'm pretty sure @plebhash already fixed it here: https://github.com/stratum-mining/stratum/pull/910

correct

@rockcoolsaint we welcome reviews on #910

rockcoolsaint commented 4 months ago

Anyone working on this?

Yeah I'm pretty sure @plebhash already fixed it here: #910

correct

@rockcoolsaint we welcome reviews on #910

Alright

plebhash commented 4 months ago

closed via #910