Add encoding for binary data returned from RPC, make upstream handle it and UTF8

kaykurokawa commented 5 years ago

Three RPC/CLI issues related to the Normalization hard fork PR #159

1) I've had some problems being able to input utf8 strings into the CLI interface .. I'm not sure what the problem is but it does not seem to be parsing the string properly, I made sure my command line supported utf8. I don't think there is any changes we need to make to lbrycrd but I could be wrong..

For RPC, input works fine (as long as your JSON RPC supports it, the popular one does https://github.com/jgarzik/python-bitcoinrpc ). However, I realized that the RPC output actually encodes unicode in this weird way: https://github.com/lbryio/lbrycrd/blob/master/src/univalue/lib/univalue_write.cpp#L15 , so this is actually a bit annoying here as we need to decode this properly, or perhaps its ok to remove this encoding? I presume its there because some command lines may support only ASCII...

2) I think an RPC call that lets the user know what normalization does would be good:

checknormalization(name) -> returns normalized name

This function was added here: https://github.com/lbryio/lbrycrd/pull/235, but will probably need some additional return arguments based on what we decide for https://github.com/lbryio/lbrycrd/issues/234 (i.e, are we returning a normalized string with the casing preserved?)

3) Further I think existing RPC calls need to be modified depending on what we decide here https://github.com/lbryio/lbrycrd/issues/234 , i.e. are we normalizing the input "name" for calls like getvalueforname() and getclaimsforname() or are we expecting that the input is already normalized?

bvbfan commented 5 years ago

Further I think existing RPC calls need to be modified depending on what we decide here #234 , i.e. are we normalizing the input "name" for calls like getvalueforname() and getclaimsforname() or are we expecting that the input is already normalized?

We have relation one to many, normalization name to claims and supports so we expect non-normalized string, then we find node by normalized it, results should be filter by non-normalized string again to show exactly desired claims. So we should make changes to these RPC methods as well, good point. But, we don't need such a method like checknormalization, let's make an example: let's we have 3 claims Dog, DOg, DoG checknormalization DOg => dog getclaimsforname dog => will return nothing, why? because we have 3 claims but none of them is exactly lowercase. The user desire is DOg so calling checknormalization will lie him, instead: getclaimsforname DOg => will return only 1 claim, exactly named DOg

Normalization should be transparent and makes sense in trie not outside so if you have before normalization one claim DOg you will have same result afterwards.

BrannonKing commented 5 years ago

Responses to @kaykurokawa :

I have no problem removing the output encoding. No one is running ascii-only shells. Let's do it.
I think we've fully defined that our "normalization" includes a case change. We have two strings: the normalized form and the original. Let's not add a third state of "normalized and lower cased". I think #235 is sufficient.
Yes, we normalize the input for any query. The user will need to see all claims that compete against each other. And the winning claim is against all names that normalize to the same thing.

Responses to @bvbfan :

In your example, "getclaimsforname dog" should return all three. That's its existing behavior.
"getclaimsforname" dog has to return the winning claim at the node. That's its existing behavior. The node lookup now requires that we normalize the input first.
Yes, I want to keep the same behavior for names not in competition. However, normalizing the trie will cause a few additional competitions that were not pre-existing. That will be okay.

bvbfan commented 5 years ago

@BrannonKing, you are right for current normalization implementation, but i got the impression we will change it. Note test before normalization

BOOST_AUTO_TEST_CASE(claim_rpcs_claims_test)
{
    ClaimTrieChainFixture fixture;
    std::string sName1("DOg");
    std::string sName2("DoG");
    std::string sValue("test");

    rpcfn_type getclaimsforname = tableRPC["getclaimsforname"]->actor;

    UniValue claims;

    CMutableTransaction tx1 = fixture.MakeClaim(fixture.GetCoinbase(), sName1, sValue, 1);
    fixture.IncrementBlocks(1);

    CMutableTransaction tx2 = fixture.MakeClaim(fixture.GetCoinbase(), sName2, sValue, 2);
    fixture.IncrementBlocks(1);

    UniValue params1(UniValue::VARR);
    params1.push_back(UniValue(sName1)); // DOg

    claims = getclaimsforname(params1, false)["claims"];
    BOOST_CHECK(claims.size() == 1U);
    BOOST_CHECK(claims[0]["nEffectiveAmount"].get_int() == 1);

    UniValue params2(UniValue::VARR);
    params2.push_back(UniValue(sName2)); // DoG

    claims = getclaimsforname(params2, false)["claims"];
    BOOST_CHECK(claims.size() == 1U);
    BOOST_CHECK(claims[0]["nEffectiveAmount"].get_int() == 2);
}

It returns only one claim - the exactly case sensitive name, after normalization claims size will be 2. Since we can keep non-normalization name we can keep same behavior.

BrannonKing commented 5 years ago

Let's change this issue to just refer to the RPC output encoding. From our discussion yesterday, it sounded like people were in favor of holding off on changing the output encoding until the upstream merge is done. It doesn't have to be done as part of the normalization branch (and there are already many changes there). Part of that requires us to change how the value field is encoded. It will need to come out as hex or base64. Polls on that are ongoing.

kaykurokawa commented 5 years ago

I tried changing https://github.com/lbryio/lbrycrd/blob/master/src/univalue/lib/univalue_write.cpp#L15 to not encode unicode in such format, but this change was not sufficient to print out utf8 characters through the CLI. The below error was returned,

error: couldn't parse reply from server

BrannonKing commented 5 years ago

@kaykurokawa , I did the same change originally and saw the same error. I realized that the univalue changes went all across the codebase. Hence, the only approach I had was to upgrade the whole package. That's what I did in the branch named "normalization_plus_base64". You're welcome to rebase that branch if needed.

BrannonKing commented 5 years ago

Done in master. We now return hexadecimal for the value fields.

lbryio / lbrycrd

Add encoding for binary data returned from RPC, make upstream handle it and UTF8 #236