nopara73 / WasabiVsSamourai

10 stars 1 forks source link

Include JoinMarket transactions #2

Open MaxHillebrand opened 4 years ago

MaxHillebrand commented 4 years ago

Can you figure out which transactions are coordinated in JoinMarket, and include them in the volume analysis?

nopara73 commented 4 years ago

Yes, however I would need to check the input values too I think, and Bitcoin transactions doesn't contain them. So right now this code within 10 minutes, if I add an RPC command for every input, it'd take days.

nopara73 commented 4 years ago

Wasabi txs are easy to identify from the outputs based on the coordinator address, Samourai transactions are also easy, because they use very specific amounts and number of input and outputs, but CJ can be more varied, so there in order to not create fake statistics, I'd need inputs, too.

kristapsk commented 4 years ago

I think this pattern could be tried for hunting down JM tx'es without checking input amounts:

But not sure how much false positives it will give from, for example, batch payouts from exchanges.

nopara73 commented 4 years ago

"number of equally sized outputs" > 2 (could be 2 in theory too, but will give a lot of false positives likely and are rare)

I disagree. My most common tx type in 2016 was that, due to larger ones were failing all the time. Also it'd get to the misleading territory if we wouldn't include them :/

equally sized output amount above or equal to 0.001 BTC (or even 0.01 BTC, which is current default in joinmarket.cfg)

What is this?

"number of inputs" >= "number of outputs"

Is this really correct? (1, 2.1, 3.2) -> (1, 1, 1, 1.1, 2.2)

Anyhow, this is still way too broad, it'd be full of with false positives. Although it doesn't look at txchains and tx inputs, so should be easily added, IMO it'd mislead more than it'd help :/

kristapsk commented 4 years ago

equally sized output amount above or equal to 0.001 BTC (or even 0.01 BTC, which is current default in joinmarket.cfg)

What is this?

I meant, value of equally sized output should not be below that.

Is this really correct? (1, 2.1, 3.2) -> (1, 1, 1, 1.1, 2.2)

Yes, when taker is doing sweep (sendpayment.py with 0 amount specified, which means all, so no change for himself), assuming zero cjfees, irl 2.1/1.1 and 3.2/2.2 should not be exact matches.

nopara73 commented 4 years ago

@kristapsk for the record I've implemented and ran your heuristics for a few thousand blocks before JoinMarket launch and it did not find a single false positive, even though I set the parameters to your weakest possible suggestions:

"number of equally sized outputs" > 2 (could be 2 in theory too, but will give a lot of false positives likely and are rare)

I used 2 here.

equally sized output amount above or equal to 0.001 BTC (or even 0.01 BTC, which is current default in joinmarket.cfg)

I used 0.001 here.

no more than one output with different address type than others and that must be one of equally sized ones and only following combinations then are allowed: n P2PKH, n P2PKH + 1 P2SH, n P2SH, n P2SH + 1 P2PKH, n P2SH + 1 bech32

I didn't bother with this.

also, number of equally sized outputs could be limited to 20 or 30, as more aren't practical, due to IRC server rate limits for privmsg's

I used 30 here.

nopara73 commented 4 years ago

Aaah, I forgot to turn off the flag that says to only look for txs after JM launch so my results above are not relevant. In fact I have a bunch of false positive after I turned off the flag.

2020-05-07 17:00:17 INFO        Scanner (173)   Block 390004, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:18 INFO        Scanner (173)   Block 390005, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:22 INFO        Scanner (173)   Block 390007, JM: 2, WW: 0, SW: 0
2020-05-07 17:00:28 INFO        Scanner (173)   Block 390011, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:31 INFO        Scanner (173)   Block 390013, JM: 2, WW: 0, SW: 0
2020-05-07 17:00:41 INFO        Scanner (173)   Block 390022, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:50 INFO        Scanner (173)   Block 390029, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:56 INFO        Scanner (173)   Block 390033, JM: 1, WW: 0, SW: 0
2020-05-07 17:01:00 INFO        Scanner (173)   Block 390036, JM: 1, WW: 0, SW: 0
2020-05-07 17:01:10 INFO        Scanner (173)   Block 390043, JM: 1, WW: 0, SW: 0
2020-05-07 17:01:36 INFO        Scanner (173)   Block 390065, JM: 1, WW: 0, SW: 0
2020-05-07 17:01:55 INFO        Scanner (173)   Block 390082, JM: 1, WW: 0, SW: 0
2020-05-07 17:02:05 INFO        Scanner (173)   Block 390087, JM: 1, WW: 0, SW: 0
kristapsk commented 4 years ago

I have also written some cj tx detection code in bash recently, but it also gives some false positives, biggest problem is the same as for you, I don't see amounts and types of inputs. But it's ok for me to monitor cj activity in recent blocks manually (there's script that allows me to do ./listpossiblecjtxids.sh $(bitcoin-cli getblockcount) for the most recent block, for example). https://github.com/kristapsk/bitcoin-scripts/blob/03f27cd1ae00813232ddc0f9d36a015c267fca33/inc.common.sh#L257

nopara73 commented 4 years ago

I don't see amounts and types of inputs

I do, I use Bitcoin Knots.

nopara73 commented 4 years ago

n P2PKH, n P2PKH + 1 P2SH, n P2SH, n P2SH + 1 P2PKH, n P2SH + 1 bech32

Forgive my ignorance, but why? Can't there be other combinations like n bech32, n witness script hash, n bech32 + 1 witness script hash and similar combinations?

nopara73 commented 4 years ago

Btw so far my progress is this:

https://github.com/nopara73/Dumplings/blob/b8fbbad0c9174a1cab2539669dc72047d3104546/Dumplings/Scanning/Scanner.cs#L173-L246

FTR everything after https://github.com/nopara73/Dumplings/blob/master/Dumplings/Scanning/Scanner.cs#L185 is this idea of yours:

n P2PKH, n P2PKH + 1 P2SH, n P2SH, n P2SH + 1 P2PKH, n P2SH + 1 bech32

As you can see it's pretty complex and I don't feel confident about it and especially don't think it's future proof. Anyhow I'll review your code and incorporate things if there is anything that you didn't already wrote in your first comment.

kristapsk commented 4 years ago

I don't see amounts and types of inputs

I do, I use Bitcoin Knots.

Without -txindex?

Can't there be other combinations like n bech32, n witness script hash, n bech32 + 1 witness script hash and similar combinations?

Not currently for JM, as it does not support equal value output coinjoins from native segwit wallets. So bech32 can only be a destination address of a taker. This may and likely will change in future.

Anyhow I'll review your code and incorporate things if there is anything that you didn't already wrote in your first comment.

First comment was about JoinMarket, my script tries to catch all coinjoins, not only JM. It's not directly related to this issue. :)

nopara73 commented 4 years ago

Without -txindex?

I don't know if it's needed. It's getblock with verbosity 3.

Btw, I just found this thing: https://content.sciendo.com/view/journals/popets/2018/4/article-p179.xml

It describes an algo for identifying JM txs in the appendix. I hope I find something interesting.

nopara73 commented 4 years ago

Now I'm even doing subsetsum, but there are a bunch of txs those you cannot even tell with your own eyes if they're JM txs or not, like this: https://www.smartbit.com.au/tx/5282615a41ef480f87f04c2558c70f451b4675f75c482c45dc7efbc82d4a626b

(This obviously isn't as it happened in 2011.)

I'm afraid one would need to spend weeks to make sure to catch all the JM transactions (without monitoring the orderbook of course.)

chris-belcher commented 3 years ago

A good way to detect JoinMarket coinjoins is to check that they are directly connected to other JoinMarket coinjoins. This is the method used in the 2016 paper "Join Me on a Market for Anonymity by Malte Möser and Rainer Böhme."

This works because you'll only very very rarely see a JoinMarket coinjoin on its own not connected to any other coinjoin. The software creates coinjoins sequentially, and makers are generally incentivized to keep their bots running and repeatedly take part in many coinjoins.