Closed zcashbug closed 6 years ago
Confirmed via debug log: When there is incoming blocks between joinsplits in a multi-joinsplit tx, the later joinsplit anchors on the wrong block.
Here is a redacted portion of my debug.log. It is very difficulty to redact this and make it useful, without revealing any private/linkable information. See below for detailed notes about what I changed. The significant part: look at "spending note" lines with "height=p, confirmations=q". I did these at appropriate offset to make sense: The later note spend is showing height+confirmations=22, the earlier note spends both show height+confirmations=20. (It is same difference of sums, in the real log.)
Note, I do not know why this shows only this much time passing from start to failure, and only this much attempted joinsplit. My CPU was spinning for much longer then this, before I learned there is failure. I show only what is in the debug.log. I selected the portion by egrep for (opid-xxxxxxxx|UpdateTip), then I trimmed the irrelevant before/after UpdateTip lines.
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: z_sendmany initialized (params={"fromaddress":"zc0","amounts":[{"amount":4.00010000,"address":"zc0","memo":"58585800"}],"minconf":1,"fee":0.0001})
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: found unspent note (txid=0000000000, vjoinsplit=0, ciphertext=0, amount=1.00010000, memo=f600000000)
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: found unspent note (txid=0000000001, vjoinsplit=0, ciphertext=0, amount=1.00, memo=f600000000)
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: found unspent note (txid=0000000002, vjoinsplit=0, ciphertext=0, amount=1.00, memo=f600000000)
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: found unspent note (txid=0000000003, vjoinsplit=0, ciphertext=0, amount=1.00, memo=f600000000)
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: spending 4.00010000 to send 4.00000000 with fee 0.0001
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: transparent input: 0.00 (to choose from)
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: private input: 4.00010000 (to choose from)
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: transparent output: 0.00
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: private output: 4.00000000
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: fee: 0.0001
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: spending note (txid=0000000003, vjoinsplit=0, ciphertext=0, amount=1.00, height=10, confirmations=10)
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: spending note (txid=0000000001, vjoinsplit=0, ciphertext=0, amount=1.00, height=10, confirmations=10)
2017-11-03 00:00:00 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: creating joinsplit at index 0 (vpub_old=0.00, vpub_new=0.00, in[0]=1.00, in[1]=1.00, out[0]=2.00, out[1]=0.00)
2017-11-03 00:02:58 UpdateTip: new best=0000000000000000000000000000000000000000000000000000000000000000 height=21 log2_work=256.0 tx=100 date=2017-11-03 00:02:43 progress=0.999999 cache=0.0MiB(0tx)
2017-11-03 00:05:30 UpdateTip: new best=0000000000000000000000000000000000000000000000000000000000000000 height=22 log2_work=256.0 tx=100 date=2017-11-03 00:05:15 progress=0.999999 cache=0.0MiB(0tx)
2017-11-03 00:06:05 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: spending note (txid=0000000000, vjoinsplit=0, ciphertext=0, amount=1.00010000, height=10, confirmations=12)
2017-11-03 00:06:05 opid-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: z_sendmany finished (status=failed, error=Selected input notes do not share the same anchor)
What I changed here:
Times were started at midnight. With offset from starting point being approximately real.
txids are consistently set to same constants. So when you see txid=0000000003
, it refers to the same tx on both lines where that is shown.
height and confirmations (the most important parts here) are set here so they show the exactly same issue, as in the real debug.log.
All input amounts set to 1.0, except for one 1.0001 to allow for the fee. Output amount set to whatever that adds up to. In other words, I made it plausible (but it is unimportant for this bug). All memos normalized. Again, it is not relevant here.
Nothing was changed with vpub_old
, vpub_new
, etc. There was no transparent inputs/outputs.
Please, is there any reasonable workaround for this? "Reasonable" is meaning, not to send lots of transactions to own addresses (wasting fees) just to be merging my notes. That is only what I can think of. Meanwhile, money is stuck. Thank you.
I know that this isn't an answer you probably want to hear, but use a faster computer and/or one with more RAM so that the joinsplits execute faster
Thanks @radix42, but that will not solve the bug. Note merging is desirable, that's why there is #2493 and #1961 both opened by @nathan-at-least. And the CPU speed is a red herring, a faster CPU will only change it to a race condition: Even if you have Pentium Xeon Core17 i9 at 42 GHz, a new block could still come in during execution. It's plain broken, if it tries to anchor wrongly in any circumstance. Serious logic error in zcashd.
Bisect info: Looking thru my records, I did the exactly same procedure in v1.0.7-1. Multiple joinsplits, execution_secs >1140 seconds, crunched balance into 2 notes. It worked perfectly. Fail on v1.0.12.
I don't know where is the first version it broke. I do no do this often, because it's costing big CPU time plus tx fee.
So that narrows to 422 commits. Hope it helps some.
1140 seconds is 19 minutes, so it's unlikely that no block arrived in that time.
Related to #1614.
It's obviously a rather ugly workaround, but you could try disabling your network connection while the transaction is being generated.
I may have misinterpreted what's happening here; it looks like different anchors are being used for input notes within a given JoinSplit. @str4d and I looked briefly at the code recently and thought that the problem was insufficient locking; that may be correct but I'm no longer sure.
I am sorry, my diagnosis was incorrect. I apologize for waste of looking at anchor selection timing.
The problem is much worse: I have a poison note, completely unspendable. This is "LOSS OF FUNDS" bug. Any transaction with this note will fail, even with 1 joinsplit -- even if it is the only note selected for the transaction. It is a small change note created by v1.0.12. It fail with different messages, seems random, I don't know why:
I could not know this without trying tx both with/without this note. I found it out when in the panic I dug out most of my ZEC from that address. With no manual note selection, this took me half the night with many transactions (successful and failed), many tx fees. Now I have two small notes I can't touch: The poison note, and another smaller note which is never selected no matter what I do. (Even I try to send exact amount of that note -0.0001 tx fee, zcashd selects the poison note instead, tries to generate tx with change, and fails.) This is lost money right now.
Please tell me, should I gather details on this issue and change its title, or close it (INVALID) and open new issue? What easiest for you?
Possibly related questions/issues:
Thank you for your attentions on debugging this matter!
The condition that causes the anchor error occurs when a note is being spent and is unlikely related to the note validity.
Please let us know what happens when you start zcashd with the -reindex flag.
The error message was added in #1911 (which was fixing #1823). That's presumably where the bug was introduced or not-fixed.
We're keeping track of this same issue in other tickets.
We're closing this ticket for now. Feel free to reopen.
I am nullius. To my recollection, this is the first time that I have ever acknowledged an alternate nym.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I acknowledge that I am “zcashbug”, the reporter of Zcash Issue #2707 on Github:
https://github.com/zcash/zcash/issues/2707
-----BEGIN PGP SIGNATURE-----
iHUEARYKAB0WIQSNOMR84IlYpr/EF5vEJ5MVn575SQUCYs30HgAKCRDEJ5MVn575
Sf/XAP4p0znHQDaKg16ZHJwG/UGE9v11RBA6N11LRo6E4IzYhwEApdJeLkhvd2Sa
KkginMzTMyZDzLoYKt9BHrkwfAJd+AY=
=ayUl
-----END PGP SIGNATURE-----
I independently discovered a workaround for this issue, which restored access to my funds in January 2018. It involved reindexing, as @ioptio suggested in February 2018 (https://github.com/zcash/zcash/issues/2707#issuecomment-365034380). My ZEC were inaccessible from November 2017 to January 2018—including during the wild market movements of December 2017. This was a significant problem to me, since I had put most of my money into ZEC; and it did cause me financial losses.
The problem reoccurred thereafter, but I developed further workarounds as described below. This was solely a Sprout issue. I do not recall if the Overwinter upgrades made any difference. I forgot about this after I moved my funds through the turnstile to the Sapling pool—a process that I started right after Sapling was activated. Sapling was always steady as a rock, plus so much faster and less hungry for RAM!
Although I have no wish to spam the Zcash bugtracker, I update this for the historical record. I also wish to thank @daira for trying to help me restore my money, when I knew that I could not provide adequate debugging information without risking compromise to my privacy. I regret having disappeared here after I regained access to my funds; I should have updated this in January 2018, with information that may then have helped others.
For my part: I am an unusual Zcash user. My pain tolerance for using ZEC is almost unbounded. I had been yearning for zero-knowledge proof privacy ever since I first heard of Zerocoin for Bitcoin in 2013. This incident did not stop me from using ZEC! It did, however, urge me to keep a healthier balance with BTC; and it entirely stopped me from promoting Zcash to others, until I had some experience with Sapling’s reliability. After I had accrued significant hands-on experience with Sapling, I started to recommend Zcash to my friends; and I even gave out some nontrivial free ZEC (to close personal friends only) as an inducement to create a shielded wallet.
It was also one of several different major factors in an oddity that some people have inquired about: The nearly five-year gap in Nullian activities on the Zcash Forum, between 2017-10-20 and 2022-07-02. (In the interim, I did have some activity under other nyms; I will not identify or acknowledge them, save to note that I never got in trouble with the moderators under any nym. I just like my privacy.)
I suggest that maintainers should “lock and limit” this issue. I will be linking to this comment from elsewhere. I do not wish to incite hostile trolling in the Zcash bugtracker over a problem that has not existed since 2018. At the time, this was frustrating and, frankly, quite scary for me. Years later, it is something that I can laugh about as a bleeding-edge Zcash early adopter. Early Bitcoiners also sometimes have some harrowing war stories.
In January of 2018, I did a reindex. It took me several weeks—on-and-off letting the node spin my machine and saturate my IOPs for dozens of hours. I noted earlier that a Sprout send took me over 19 minutes for multiple JoinSplits (https://github.com/zcash/zcash/issues/2707#issuecomment-341867946); just imagine the time to reindex the blockchain! AFAICT now, I finished the reindex and regained my funds shortly before 2018-01-28/block 260946; I would need to dig through debug logs that I saved to ascertain more precise information.
Fortuitously, when I regained access to my funds, I moved and merged my shielded notes into a new shielded note at a new address. When the issue recurred, instead of reindexing, I deleted my wallet, created a new wallet, and used z_importkey
with a startHeight
parameter a few blocks before the merge.
As an ongoing workaround, I soon developed some ugly shell scripts for using the z_mergetoaddress
RPC (then experimental) to consolidate my funds to a new note at ever-increasing block heights, at the same address. (Moving to a new address was not necessary for this.) This effectually made a series of checkpoints from which I could restore unspent funds; moving the latest checkpoint ever-higher cut the rescan time for z_importkey
with a specified starting height, an important issue on my underpowered hardware. I also developed some ugly shell scripts for switching wallets around.
Security note: Thank you for the -stdin
argument to zcash-cli
! The notion of passing keymat in argv
makes my skin crawl.
The ultimate result was that my current wallet was always ignorant of my transaction history. That did not matter to me. Whenever I lost access to funds, I simply threw out my wallet, and made a new one starting at a recent height—fast, easy, and I could keep using ZEC! 😺
(The foregoing is written partly from memory, and partly from a cursory review of files that I still have laying around. I do not guarantee that no errors crept in from bitflips in my robotic AI the lability of human recollection. I will not take the time to examine all my old files to reconstruct the exact events surrounding a long-fixed bug.)
Yes, I am terrifically stubborn; and as I said above, my pain tolerance for using ZEC is almost unbounded.
I vehemently refuse to carry a cattle-tracking location beacon with two-way Orwellian telescreen, a.k.a. a so-called “smartphone”. I use Tor for all of my ordinary day-to-day websurfing, even though it is slow—and even though much of the Web is broken for me, especially because I usually disable Javascript. I started using Zcash on hardware so inadequate that to do a Sprout shielded send, I needed to shut down almost all programs and services except tor
and zcashd
, and let zcashd
hog my whole system for a minimum of 6–8 minutes for a single-JoinSplit transaction.
An insistence on continuing to use Zcash after it almost ate money I couldn’t afford to lose is just “typical nullius”.
I am proud of that. Now that zero-knowledge proofs are starting to take over the world, I can laugh and tell people, “I told you so.”
Describe the issue
For purpose of note merging (see #2493), I send a z-addr whole balance to same z-addr (minus 0.0001 ZEC transaction fee). This is important to cut latency; multi-joinsplits take many minutes on my computer. I did this before, last time at least 6 months ago. Now there is a regression, make transaction fail!
My hypothesis, according to the spec: While multi-joinsplits calculated over many minutes, new blocks are coming in. It seems maybe to be anchoring the later joinsplits to newer blocks than the older joinsplits in the same transaction. But I don't know how to test this hypothesis.
I think this means also, I can't send my whole balance at once to any address. So my money is locked up. Urgent problem. Need workaround.
Can you reliably reproduce the issue?
If current balance is (for example) 1.23456789 ZEC, I do this:
Expected behaviour
A successful transaction.
Actual behaviour + errors
After almost 20 minutes (expected time), I get this (actual output, redacted for privacy):
The version of Zcash you were using:
1.0.12 (official zcash .deb)
Machine specs:
Any extra information that might be useful in the debugging process.
I run this with debug log; but the log is big. I don't even know what to grep for.
I know of no supported RPC to select input notes and construct a multi-joinsplit manually.
Maybe I could somehow disconnect all peers when constructing the joinsplits, to avoid getting new blocks (if that is the problem)??
It is difficult to provide useful info for you. I will not reveal anything that could compromise privacy of my wallet. I cannot post debug log, unless I can find relevant lines and carefully sanitize them. Sorry, I know, difficulties of producing privacy software. If this could help, please say what info I can add without revealing any private stuff?
I do not have sufficient computer to build zcash, if patch is produced (I can barely do joinsplits). Any workaround I try will tie up my whole computer for almost 20 minutes.
Note selection is 100% under the control of zcashd. This should never happen. Money stuck. Even before/until fix, please help with a workaround!!
Thanks.