Submit Shares with higher diff than send to client

TheSerapher commented 10 years ago

https://github.com/moopless/stratum-mining-litecoin/blob/master/mining/service.py#L90

Reading this line, I would assume the following could happen:

New miner connects at POOL_TARGET difficulty
Template is used to send out POOL_TARGET difficulty shares
Miner is working on POOL_TARGET until RETARGET_TIME is matched
While meeting the new TARGET with NEW shares sent to miner, OLD POOL_TARGET shares will be accepted with RETARGET_DIFF difficulty due to the workers session difficulty being used

I am not sure if my way of thinking is correct, but I have seen issues with clients that tend to re-target a lot. Hashrates up to 50 times higher than they should be (5.6 mhash instead of 800 khash). When checking the DB I saw a lot of high diff shares being accepted right after a diff retarget from 16 to 480.

EDIT: Maybe this can be further tested when using a low re-target time? It might show a lot of weird diff shares being added in the DB that might be reported differently in the client?

EDIT2: More findings. Worker difficulties are stored per worker-connection. They are created here: https://github.com/moopless/stratum-mining-litecoin/blob/master/mining/service.py#L49 . Once created, it is also used to access the mentioned workers difficulty and store it inside the session when doing a re-target: https://github.com/moopless/stratum-mining-litecoin/blob/master/mining/basic_share_limiter.py#L148 So unless shares are properly validated against their ORIGINAL difficulty and not the assumed one by the session, I would think this could lead in issues when retargets happen often or re-targets changes are very different (e.g. the mentioned 16 to 480).

EDIT3: Looked at how jobs work, apparently they are not stored on a per-share basis (each share has a stratum stored job-id) but are rather only referencing the block that we wish to find: https://github.com/moopless/stratum-mining-litecoin/blob/master/lib/template_registry.py#L85

These are all assumptions! I have by no means any clue how stratum/templates and all that works together in the grand scheme of things. If anyone is able, and willing to, shed some light on this, that would be much appreciated!

If all these assumptions are correct, I'd think it might be needed to find a way to 'cache' sent out shares and they difficulty and finding a way to re-evaluate their difficulty based on the send share once the client returns the properly calculated results. Otherwise shares could always be off.

Another idea: Slowly ramp up share difficulty for workers. Instead of assuming a diff that might match the share rate we aim for per worker, only increment by a percentage. To allow workers to hit their target share rate faster, we can lower the retarget time. Since we only increase by a certain percentage, even though we retarget more often we would still not submit shares that are completely off. And a good side effect: The higher the diff the higher you would increase the diff with the next retarget. Since higher diff means slower share rates already, the potential risk of shares being off diff would decrease.

flound1129 commented 10 years ago

The problem is that if you adjust difficulty up without allowing the old diff, you get a bunch of shares rejected by the pool for being above the target, because the miners have a local queue of work that they have to get through before they'll start working on the higher difficulty shares.

I looked at this code a while back and guessed that it could potentially be exploitable. Have a look at the generalfault version, I believe it lets you set maximum adjustments.

TheSerapher commented 10 years ago

Maximum adjustments are also just a workaround. The only good solution is the way eloipool does it: Send work out, log it as a job, receive work and check what job information is stored for it. This way you won't reject shares and you can check it properly.

obigal commented 10 years ago

Here is a little different vardiff order more or less in powers of 2 if you would like to experiment, well actually it just doubles the initial value e.g 16,32,64,128,256,512,1024

It only retargets one step at a time up or down no matter the difference asked for, it could probably be improved, I haven't messed with it for some time now.

925076c34852854c3f9792469c7d97b0dd025f3d

M0nsieurChat commented 10 years ago

So it seems the moopless stratum fork is logging share solely on the difficulty variable stored in session (on a per-worker basis) while a client could still be sending shares at a previous (and lower) difficulty. Edit : wrong statement, please continue reading.. :)

I'm not supposed to be a developper, but next to my internal tests on my pool, I check my users shares diff using this formula which return the share diff from the hash_int :

Algebra tells us the diff_to_target is the same as hash_to_diff share_diff = int(self.diff_to_target(hash_int)) https://github.com/moopless/stratum-mining-litecoin/blob/master/lib/template_registry.py#L245

You can try enabling the display of this var at any time on a testnet pool.

So, what I'm trying to say is : instead of logging in the shares table in the database the session difficulty set by your vardiff engine, just use your vardiff engine with the session variable to manage the broadcasted mining diff to your stratum clients BUT log the share diff using the above formula. As you can easily know the price per share for a difficulty 1 share, it is then easy to fairly pay your miners with that share diff from that formula. I am not using this fork so I don't know if I'm helping or if it would be difficult to log the share difficulty using this formula instead of using the session difficulty reference.

Cheers.

NINJA EDIT : It already is the case, share_diff is used to log the share diff in the database It is returned by the method in template_registry : https://github.com/moopless/stratum-mining-litecoin/blob/master/lib/template_registry.py#L281

So, in service.py it is captured back : https://github.com/moopless/stratum-mining-litecoin/blob/master/mining/service.py#L99 It is rightly used to log healthy shares Interfaces.share_manager.on_submit_share(worker_name, block_header, block_hash, difficulty, submit_time, True, ip, '', _sharediff)

BUT still using difficulty (which is session['difficulty']) for bogus shares where an exception was thrown by template_registry : Interfaces.share_manager.on_submit_share(worker_name, False, False, difficulty, submit_time, False, ip, e[0], 0)
https://github.com/moopless/stratum-mining-litecoin/blob/master/mining/service.py#L103

(after thinking twice, it seems hard to use share_diff when an exception was thrown by template_registry : we don't know how and where the template_registry check failed, it leads to an undefined share_diff variable)

That is to say : It may not be related with the issue we are talking about, unless you are calculating the hashrate (or worse, giving a LTC reward) including the bogus shares where template_registry thrown an exception. If I'm right and your CMS is calculating the hashrate from all shares (including the bogus ones), maybe try avoid including those (or don't log these shares at all if you don't need history of bogus shares). If I'm right, seems more like a glitch than an exploit.

Final edit : Remember, generating a bogus share (that won't end up being paid as our_result won't be "Y") doesn't take any effort. Still, this bogus share will be logged with session['difficulty'] and may fuck up with our hashrate calculation functions if you take all the shares from tables in count (seems to be the case here https://github.com/TheSerapher/php-mpos/blob/next/public/include/classes/statistics.class.php#L213) flound1129 (in the next comment) points out a possible DoS (Denial of Service) at stratum / DB layer by logging bogus shares on an unlimited basis, which is this moopless fork doing currently until a commit / pull request is brought to the tree).

flound1129 commented 10 years ago

The logging of bogus shares (which can be submitted very quickly) also causes a potential for DoS at both the stratum and DB layer, I have added code to my own stratum fork to temporarily ban workers that submit more than a certain number of bogus shares in a minute.

M0nsieurChat commented 10 years ago

Thanks for taking the time to read my comment, flound1129. That's exactly what I said on my final edit (regarding the faisability of sending a large amount of bogus shares in no time). I think logging bogus shares in the database should be related to a param in config.py or the loglevel param. (And it needs to be rate limited if the pool admin enables bogus shares logging)

Waiting for TheSerapher to confirm if the shares with a high diff are bogus ones (no blockhash, no solution). If not, I guess we'll need to continue our work.

flound1129 commented 10 years ago

The problem is that if you don't log bogus shares then you cannot calculate worker efficiencies, pool efficiency, etc. But yes it could be controlled by a config.py variable.

I think a better solution is to disconnect and temporarily ban workers that submit too many bogus shares.

M0nsieurChat commented 10 years ago

You are totally right, this info is needed for efficiency stats and so on. And this bogus shares logging needs to be rate limited to avoid abuse / DoSing. Putting a table index (to speed up processing) on the our_result (or whatever is called your column to identify a valid share) column to only take the valid shares (using a WHERE myValidShareColumn = 'Y') for calculating the hashrates sounds right to me aswell. Why a bogus share would count to report an hashrate (which is your / the pool VALID hashes generated per second) ?

flound1129 commented 10 years ago

Bogus shares don't (with most pools I'm aware of) and should not count toward a user's hashrate.

M0nsieurChat commented 10 years ago

Seems that the frontend used by a good bunch of pools is using every shares in the table(s) including the bogus ones to calculate this. https://github.com/TheSerapher/php-mpos/blob/next/public/include/classes/statistics.class.php#L213 I might be wrong, not a PHP/MySQL guru..

TheSerapher commented 10 years ago

It wasn't until vardiff was added. I will fix that :-) Thanks for the heads up on the.

A share rate limiter on invalid shares might make sense. especially if it is a ton in a very short time. I have seen my mine gone nuts two times which I'm the end caused it to crash, but until then a good amount of invalids was already submitted.

Sadly this is not my project nor do I have time to fork and work on it myself. Someone else needs to step up for this project to continue and be maintained :-)

flound1129 commented 10 years ago

I'd be happy to submit my temp banning code if someone can start maintaining this again.

TheSerapher commented 10 years ago

With all these findings in this issue, maybe we can say that (after I fixed my pool frontend with above ticket closed) even with some shares being added at really strange diffs or diffs that are off, this might be a non-issue as long as only invalid shares are affected by it?

Even after reading all that, it still seems very fishy that my friend was able to pick up on a 20,000 shares lead in less than a day. From my point of view, it inserted valid shares at wrong diff, otherwise he would not have been able to catch up that fast. I have also heard that this issue was reproduced by other ops and they have seen the same strange behavior.

Hopefully this turns out to be a non issue after all. But until I am sure I'd not recommend turning VARDIFF on.

EDIT:

https://github.com/moopless/stratum-mining-litecoin/blob/master/mining/service.py#L99-L100

The workers own difficulty is indeed sent to the submit_share in template registry:

https://github.com/moopless/stratum-mining-litecoin/blob/master/lib/template_registry.py#L176

The only check done on this custom difficulty is rather simple:

https://github.com/moopless/stratum-mining-litecoin/blob/master/lib/template_registry.py#L233-L234

So MAYBE this is indeed a non-issue, would not explain the weird things that I have been seeing with my friend and valid shares ... Strange.

TheSerapher commented 10 years ago

Here a sample after turning VARDIFF on again:

| 302111 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 0000001f4cd5fffb1d957e0b7579293c6d4a260aa8d2d9815b3632ec142ca9cc |        384 | 2013-10-31 07:30:59 |
| 302110 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 0000002c7b7c10983aa34043eab1531fe790b9ac59795bcbbc736fb91b74ba2e |        384 | 2013-10-31 07:30:55 |
| 302109 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 000002c7dd05a2a54c84e4d5ff9e8fc7345371243c2267f7ccb34304c2b19342 |        384 | 2013-10-31 07:30:55 |
| 302108 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 00000180f53583de667bb3b0434ae27472a61f71f2a78dcaf7e265c80c87882f |        384 | 2013-10-31 07:30:53 |
| 302107 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 0000003404a15e182a9db1e8930dfe380c539f2fed6f95e6af576756eb9d1406 |         64 | 2013-10-31 07:30:53 |
[...]
| 302105 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 000003d169b15bc0f51f4a83d284872e52fa1cdb0d11d8eb9f46f37939efd6a2 |         64 | 2013-10-31 07:30:42 |
| 302104 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 0000012137c921e5a2fb478fa8512f2d96a839268a05a38b97930973d4a136fc |         64 | 2013-10-31 07:30:38 |
| 302103 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 00000121e2250c6bd5e24c6b194f223dcad3aab0e9904b4a47f14aeed7aaf34a |         64 | 2013-10-31 07:30:32 |
| 302102 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 0000039b3a36e1a684acc9473e0a55d35c6b1f5860ea5fafda891a998e8401c3 |         64 | 2013-10-31 07:30:28 |
| 302101 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 000002c5e29040c387c6cde73b8785398c946474740b7600273c43f75eea9128 |         64 | 2013-10-31 07:30:19 |
| 302100 | 10.0.2.2 | cptvir2l.amd        | Y          | N               |        | 0000009c7dcccd17bc4953e5ce47748c4ce32347bcdbc1f21cc463f37ca0575a |         64 | 2013-10-31 07:30:15 |

Long line of 64 diffs (POOL_TARGET) followed by high diff shares in very quick succession. One might say that was luck, but I had it happen twice with his re-targets:

| 302081 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000002a988259ca175d71aa774c88b34000410ad6d420df4dae2cc8330cc5c5b |         64 | 2013-10-31 07:28:58 |
| 302077 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000002806e1178568233bf7be4198f2f1e081b55239121272651b581f850ac95 |         64 | 2013-10-31 07:28:48 |
| 302070 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 0000005af7d649b434a1a218ca14de71cf0dd701b64bde97178049d8bc63ba82 |        480 | 2013-10-31 07:27:29 |
| 302068 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 0000005bcbdae576a7e1915135677a1909dab5a18aa589052ae8bffad4369506 |        480 | 2013-10-31 07:27:19 |
| 302065 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 00000047d0e62b9e69e0b60776aeef61290ef971b635aef46009151fadc4413e |        480 | 2013-10-31 07:27:07 |
| 302059 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000000f6fcbda6920952f51628b5119bc7f8e03b6e4443858ed656fb94985dba |         64 | 2013-10-31 07:26:47 |
| 302056 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 0000026e31f9a9d14fbc6ee5d367a7782d73cc9250285d0f25e6262c82f43348 |         64 | 2013-10-31 07:26:36 |
| 302053 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 00000199312b1bfa431efd7c253920483a42426844303ea2ee5b8e330f2a9c99 |         64 | 2013-10-31 07:26:34 |
| 302049 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000000f5d0efca72cccccff7b996efb1b551ac3ef9f3a1f9ef47a39294b9543f |         64 | 2013-10-31 07:26:33 |
| 302047 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 00000202e478aa853f7fc5e80950c2b0dc1f90c14539989a5ff89048e9e4a4d4 |         64 | 2013-10-31 07:26:30 |
| 302045 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 0000002c4e24d6c875a4b96bc983169005b324dd0cf4335494986fcd8e1d1e3c |         64 | 2013-10-31 07:26:28 |
| 302043 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000001f2fbbc8b7436f70a90ad0132c7b258a8e5f696961c5874234384c25986 |         64 | 2013-10-31 07:26:20 |
| 302042 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 0000031a500f40cc1df99d4cfb9220fbbe68efdf28ca15a8881896eab021c5a3 |         64 | 2013-10-31 07:26:16 |
| 302041 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000000bbf4a3e8732136c907d263e5af4c51d8891fffb59af5bb5fac146fbbae |         64 | 2013-10-31 07:26:16 |
| 302037 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000001b016754d261a489a62a81d2a9929cd6473c23914fe01f5d7347ec7e618 |         64 | 2013-10-31 07:26:09 |
| 302035 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000002a1c93caca44e69182e1a58238024a1eba8dbc1a2b94c548141aa859f01 |         64 | 2013-10-31 07:26:01 |
| 302034 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000003348d4b5c903e5e935c590f1e77034a2b72a9e8684c79430021df51f286 |         64 | 2013-10-31 07:25:59 |
| 302030 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000000735a15956f018c6f6ca7181f79762deb6b78e9c8e34303325d2007a726 |         64 | 2013-10-31 07:25:53 |
| 302029 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 0000025105213c5204b8c5e4afc78d77345a95be56a36eca2fad183baf2db30c |         64 | 2013-10-31 07:25:52 |
| 302028 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000000918ddd41783013f51ff8e614e3fd74295048bc3c4e06e885097b8356a5 |         64 | 2013-10-31 07:25:51 |
| 302027 | 10.0.2.2 | cptvir2l.amd | Y          | N               |                      | 000001f7402aef0bdda580a0bb2324f5dcbdb1c4ad89ec6a4ea8aeef9d68e211 |         64 | 2013-10-31 07:25:48 |

I cannot deny this seems strange to me. Maybe I have been barking up the wrong tree here, but I know that he isn't faking any shares here (since he wouldn't even know how :P)

TheSerapher commented 10 years ago

Also saw one of those, can those happen randomly? He didn't change anything on his side that would cause shares higher target:

| 302173 | 10.0.2.2 | cptvir2l.amd        | N          | N               | Share is above target | 0                                                                |        480 | 2013-10-31 07:37:21 |```

TheSerapher commented 10 years ago

And here a screenshots of the past 10m hashrates. It works fine with VARDIFF disabled or shares being at 64 diff, but as soon as I enabled VARDIFF his hashrate doubles. Worst I have seen was 5 MHash on valid shares!

screen shot 2013-10-31 at 8 47 59 am

feeleep75 commented 10 years ago

I think a quite good solution would be to implement obigal's changes (I actually did it and it works very well)

https://github.com/moopless/stratum-mining-litecoin/commit/925076c34852854c3f9792469c7d97b0dd025f3d

and then we have gradual diff raise. Next step would be to implement initial diff for pool worker based on last value from pool_worker table - this should minimize the risk of simple exploit and we should still benefit from using vardiff.

M0nsieurChat commented 10 years ago

NEWS https://github.com/moopless/stratum-mining-litecoin/blob/master/mining/service.py#L109 There is both difficulty (which is session['difficulty'] and share_diff (which is true share diff at time of submit) passed on the method on_submit_share.

The method import_shares (which logs shares in the database) uses the difficulty variable to log the share diff, not share_diff : https://github.com/moopless/stratum-mining-litecoin/blob/master/mining/DB_Mysql_Vardiff.py#L57 v[10](which is share_diff) is never used is that function, however it is a known value because it is passed as an argument.

So, whatever returns template_registry regarding the true share diff (not the session['difficulty']), the share is logged with session['difficulty']. For me, it explains everything.

I can see why moopless didn't put v[10] to log the share diff. The disadvantage of putting v[10](which is share_diff fired up by template_registry) is that you'll end up with logged shares with a different difficulty each time they are logged. Like, for a worker with a stratum diff @16 you'll end up logging shares with a share_diff ABOVE 16 (not less). It could be 16, 17, 18, 400.... So you can't let it go as is with a moving difficulty. If a worker solves a block, you'll end up with a share logged with a fucking huge difficulty which would eat all your block reward (nothing left for other workers)

For me, the problem regarding your wrong share diff is right there. But I can't find any solution to log the closest stratum diff.

feeleep75 commented 10 years ago

@M0nsieurChat last comment

there is a solution to partially solving the problem - we can log into shares table lower value from share_diff and worker_diff

TheSerapher commented 10 years ago

I think the only real solution is logging work sent as a job and re-insert finished work with the information we stored in the job.

I will see about @obigal solution. Have turned vardiff off since I was seeing the weirdness again.

ahmedbodi commented 10 years ago

As discussed with dcwsco ive made my own stratum branch based on moopless for working with various coins. If anyone wishes to push a fix there the repo link is : http://bitbucket.com/ahmedbodi/stratum-mining

TheSerapher if u can confirm Obigal's fix works well ill add the code in later tonight to all branches

Neozonz commented 10 years ago

Here is the stratum-mining fix: https://github.com/Neozonz/stratum-mining-litecoin

I'll continue to maintain this code in moopless's absence any and all help would be appreciated

ahmedbodi commented 10 years ago

ill help where i can but i wont be focused on it since im mainly going to be running SHA pools so will be working on the SHA branches on my code

TheSerapher commented 10 years ago

s/fix/workaround/ but in the end you move the issue from the pool ops to the pool users. Won't go into details since we discussed this on IRC already, just posting here to ensure people know they still have to wait for a real fix :)

obigal commented 10 years ago

We could limit this even further until we have a permanent fix by maybe just keep adding pool target instead of going by doubles were we have lot bigger jumps between retargets as difficulty climbs

M0nsieurChat commented 10 years ago

Maybe someone will understand this... Didn't test, didn't try yet. If you're willing to, this is not the only file to update. There is service.py as well (to fetch back shareAtOldDiff and do something different if it happened).

I DON'T KNOW IF IT WORKS. THE LOGIC IS HERE THOUGH.

https://github.com/M0nsieurChat/stratum-mining-litecoin/blob/patch-1/lib/template_registry.py#L235

edit : fork with patch-1 branch works for my testnet pool (accepting shares && blocks), need to test it against this issue now..

obigal commented 10 years ago

Keeping it simple we can filter probably better than 95% of bogus shares without changing too much, this will add a delay 9b9f440542f34c17ee030b8435cc378d5a6bd536 before we start using the new requested vardiff. It's not a perfect solution but it works upwards and downwards with requested difficulty so much like mining itself it should average out over time.

obigal commented 10 years ago

A little update 14c5be169b8a25a8a75630372c68178db8750326 looks like doubling the difficulty (x2) is a little more predictable time wise, my results look like somewhere in the 25-30 second time range works pretty well, it's still depends on how many shares a user might have queued, and of course luck.

obigal commented 10 years ago

Curious if anybody has try using the target delay and their results if they have, I took a look at they way eloipool does it and correct me if I'm wrong but it doesn't look like it's doing anything different, all users receive the same job and when a share comes in there is nothing specific that ties it to requested difficulty other than the job id which is the same for every user.

I also have added some temporary worker banning code to my branch if anybody is interested.

M0nsieurChat commented 10 years ago

Thank you.

Still waiting for someone to review this patch https://github.com/M0nsieurChat/stratum-mining-litecoin/tree/patch-1 Quickly tried it on my testnet pool. Shares are no longer being accepted as higher diff but logged with previous diff.

ahmedbodi commented 10 years ago

ill try both of ur changes on my branch under branch issue-4 (will include the changes from branch issue-3 so it works with all coins SHA256/Scrypt POS/POW and Coins with tx messages like B0C, OSC etc aswell as the force loading of shares when a block is found (hopfully))

moopless / stratum-mining-litecoin

Submit Shares with higher diff than send to client #65