Closed lbrynaut closed 5 years ago
Addresses #65
WIP regarding travis/dependency addition.
Updated.
Updated.
I opened up a separate issue to discuss the issue of "how to do character normalization" here: https://github.com/lbryio/lbrycrd/issues/204 .
Seems like the method in this PR may not be optimal but we can discuss that separately since I think it will not affect the bulk of the changes here.
My first attempt at compiling this branch gave me this error:
../libtool: line 7485: cd: auto/lib: No such file or directory
libtool: error: cannot determine absolute directory name of 'auto/lib'
I am using system-installed boost (with the OpenSSL 1.1 PR) as I haven't been able to make the reproducible_build.sh work for me since July (on Bionic). After getting the above error, I validated that "auto" was the path for ICU, googled a bit, and read the configure.ac a bit. I decided to add this to the configure command:
--with-icu=`pkg-config icu-i18n --variable=prefix`
That did alleviate the issue. It seems that configure.ac is not correctly falling back to pkgconfig in the situation where --with-icu
is omitted.
That did alleviate the issue. It seems that configure.ac is not correctly falling back to pkgconfig in the situation where --with-icu is omitted.
Hrm, odd. It should be a separate case in the configure. I was focused on getting the reproducible builds going because we don't want boost or icu shared linkage (at least for the distributed binaries), as it's error prone and system dependent.
@BrannonKing reported a performance issue, and it's true, this is slower than it needs to be. Correctness first, then performance is how I've always operated ;-)
Anyway, I've already got a partial fix, but will be updating when I think it's ready.
My experience with this branch thus far (with fork height at 1M):
main.cpp:2124: bool DisconnectBlock(const CBlock&, CValidationState&, const CBlockIndex*, CCoinsViewCache&, CClaimTrieCache&, bool*): Assertion 'pindex->GetBlockHash() == trieCache.getBestBlock()' failed.
My experience with this branch thus far (with fork height at 1M):
@BrannonKing I dusted off the webs, please test and find more (fork related) issues.
The reindex issue is resolved. However, I was unable to run past the fork height. I modified the fork height to occur in the next few minutes. However, it never went past the block height one before my fork height. I waited about 20min. This is on mainnet. I'm seeing a lot of this error:
EXCEPTION: 15dbwrapper_error
Database I/O error
lbrycrd in ProcessMessages()
The reindex issue is resolved
Yes, same for me. Previous PR told me I needed reindex when starting with mainnet data.
I modified the fork height to occur in the next few minutes. However, it never went past the block height one before my fork height. I waited about 20min. This is on mainnet.
Brannon that sounds like expected behavior, because the fork happened on your codebase but not on mainnet.
If I try to run on mainnet with a fork height that is past I crash with this error:
EXCEPTION: 15dbwrapper_error
Database I/O error
lbrycrd in AppInit()
I think we should discuss what happens to bytes that are invalid utf-8 . Because utf-8 is variable width, not all bytes are valid utf-8. It seems like the normalization function we have now will parse invalid bytes without throwing an error. I'm not sure exactly what the logic is but if I try passing in a byte that is invalid like 0xFF to the normalization function, it returns back 0xFF because it hits this below line in the normalization function:
return (normalized.size() >= name.size()) ? normalized : name;
The variable "normalized" was actually empty when I inspected it. I create this unit test gist, for testing purposes, you can check it out: https://gist.github.com/kaykurokawa/2c35843eb09da2bc8f31ffac46de4099
So two questions. First question is whether we should attempt to normalize invalid bytes at all, perhaps we should reject them from ever entering the claimtrie? Second question is if we are normalizing invalid bytes, than is this current method correct (is it possible that invalid bytes could be accidentally normalized into valid bytes?)
If our standard is UTF-8, we should reject anything that is not valid UTF-8. That's my vote.
I tried to test this in a simple fashion this morning with no luck:
~/.lbrycrd/regtest
./src/lbrycrdd -server -txindex -regtest -walletbroadcast=0
./lbrycrd/src/lbrycrd-cli -regtest generate 30
error code: -1
error message:
Database I/O error
I tried to test this in a simple fashion this morning with no luck:
1. Remove `~/.lbrycrd/regtest` 2. Compile with regtest normalization fork height at 20. 3. Run the daemon: `./src/lbrycrdd -server -txindex -regtest -walletbroadcast=0` 4. Generate 30 blocks: `./lbrycrd/src/lbrycrd-cli -regtest generate 30` 5. Get this error:
error code: -1 error message: Database I/O error
@BrannonKing Thanks, give it a go again, this is corrected.
So two questions. First question is whether we should attempt to normalize invalid bytes at all, perhaps we should reject them from ever entering the claimtrie? Second question is if we are normalizing invalid bytes, than is this current method correct (is it possible that invalid bytes could be accidentally normalized into valid bytes?)
@kaykurokawa Good questions. There are few issues that need addressing:
1) Normalization can fail 2) Lower-casing can fail [ Generically, we call both steps normalization in the code ]
The general approach I was going for is that if either fails, fall-back to our current behavior (which is to just treat the bytes as bytes in the claimtrie). Maybe that was a bad assumption on what we need, or maybe it's not working as intended (although it appears to be to me).
Glad we're finally doing this kind of review though because I did expect this process to take some time and we're getting to the hard parts. Open to suggestions. I'm not sure rejecting any non-UTF8 inputs is the correct solution, as it does place a larger burden on the applications using lbrycrd, who really may not care about this feature.
Either way, I think we can all agree that consistency is the most important, in that either the same byte or code-point sequences are stored consistently in the claimtrie.
I'm getting close to having a solution that doesn't require keeping two tries in memory. As part of that work, I've been narrowing in on two rules:
@bvbfan , excellent analysis. With our work on #44 it appears that we achieved independence from methods taking a name in rpc/claimtrie.cpp. I think that is critical to your suggestion that we don't do string normalization in the CClaimTrie (instead, we do it all in the "cache"). I really like the suggestion! I think that puts a high priority on getting #44 merged and rebasing our normalization on top of that.
I did finally get a green on the Linux compilation again. Some notes on my issues with the reproducible_build.sh:
The advantage of Docker is that you don't have to recompile the dependencies every time. We can stop wasting time on that. I feel strongly like we need to move that direction.
This is a complete re-write of #102 and replaces it since it's much more complete and handles situations that the other does not handle.
This PR contains a lot of changes and I expect a detailed review will take some time to ensure correctness. I also expect the reproducible_build script will need further modification (likely won't work as-is again due to the added ICU dependency).
@kaykurokawa The "claimtriebranching_hardfork_disktest" unit test does not seem to work with this PR, if you have a second to see what's going on there, it would help. It came up after things were working and then I rebased to latest, so it's commented out for now.
EDIT: @kaykurokawa This last comment is no longer true, so can be ignored now.