mastercoin-MSC / mastercore

mastercore info
mastercoin.org
MIT License
24 stars 11 forks source link

Fix reported daemon freezes #268

Closed zathras-crypto closed 9 years ago

zathras-crypto commented 9 years ago

Alrighty guys, first off the important things; Happy New Year! :)

Now down to business haha - I've debugged and got to the bottom of the reports from several integrators (me included) that 0.0.9 regularly freezes.

First off, it does not freeze in the context of a full hang - let's just get that out the way. What is happening is that recent modifications to RPC have removed optimizations, causing certain looping RPC calls to take orders of magnitude longer to complete. For example:

Testing a lookup of all SP#3 balances on 0.0.8.2:
#time ./mastercored8 getallbalancesforid_MP 3 >/dev/null

real    0m0.147s
user    0m0.023s
sys     0m0.004s

Testing a lookup of all SP#3 balances on 0.0.9:
#time ./mastercored9 getallbalancesforid_MP 3 >/dev/null

real    6m48.863s
user    0m0.025s
sys     0m0.022s

As you can see 0.0.9 is several orders of magnitude (ie thousands of times) slower. MP_* folders cleared between tests and same command executed multiple times with best result taken to ensure any caches are used etc. So pretty solid confirmation of what the problem is.

Basically what has happened is that originally certain calls that executed in loops were optimized to do things like lookup the divisibility of a property only once, instead of each iteration of the loop. The change to BalanceToJSON that was merged in recently changes the model to request divisibility of a property every time we format it - so in the case of the example above (looking up all balances for MaidSafe) we lookup divisibility x thousand times (ie for each address) instead of just once.

The tl:dr; of it is this increases the processing load dramatically as we go and fetch the property repeatedly and makes certain RPC calls take minutes to complete instead of milliseconds - since the CPU is pegged at 100% during this time the RPC interface is effectively unresponsive and gives the impression of a frozen daemon for some minutes.

So long story short - we don't have a critical freezing bug as reported, but we have close too it; whilst we don't actually hang, we do get busy enough to become unresponsive.

This PR serves to add a required divisibility flag to BalanceToJSON, giving control back to the calling function to specify divisibility. 0.0.9 retested as per above repeated with this PR applied:

Testing a lookup of all SP#3 balances on 0.0.9:
#time ./mastercored9 getallbalancesforid_MP 3 >/dev/null

real    0m0.269s
user    0m0.015s
sys     0m0.003s

I need to give it a bit more checking/testing but on initial looks like it's sorted.

Thanks Z

dexX7 commented 9 years ago

Happy new year to you, too! :)

This is very interesting and I'll need to do some testing on this. Can you run each test two times right after the other and post your results?


Take a look at the function:

std::string FormatDivisibleMP(int64_t n, bool fSign = false)

First parameter is the number, not the propertyId, so

... FormatDivisibleMP(property, nAvailable)));

should be

... FormatDivisibleMP(nAvailable)));

As for the fix in general, but let's think about two other routes.

1) Pass the CMPSPInfo::Entry struct:

void BalanceToJSON(const std::string& address, uint32_t propertyId, const CMPSPInfo::Entry& property, Object& balanceObjRet)
{
    // confirmed balance minus unconfirmed, spent amounts
    int64_t nAvailable = getUserAvailableMPbalance(address, propertyId);

    int64_t nReserved = 0;
    nReserved += getMPbalance(address, propertyId, ACCEPT_RESERVE);
    nReserved += getMPbalance(address, propertyId, METADEX_RESERVE);
    nReserved += getMPbalance(address, propertyId, SELLOFFER_RESERVE);

    if (property.isDivisible()) {
        balanceObjRet.push_back(Pair("balance", FormatDivisibleMP(nAvailable)));
        balanceObjRet.push_back(Pair("reserved", FormatDivisibleMP(nReserved)));
    } else {
        balanceObjRet.push_back(Pair("balance", FormatIndivisibleMP(nAvailable)));
        balanceObjRet.push_back(Pair("reserved", FormatIndivisibleMP(nReserved)));
    }
}

2) Or fetch the Entry right within that function and return feedback:

bool BalanceToJSON(const std::string& address, uint32_t propertyId, Object& balanceObjRet)
{
    CMPSPInfo::Entry property;
    if (!_my_sps->getSP(propertyId, property)) {
        return false;
    }

    // confirmed balance minus unconfirmed, spent amounts
    int64_t nAvailable = getUserAvailableMPbalance(address, propertyId);

    int64_t nReserved = 0;
    nReserved += getMPbalance(address, propertyId, ACCEPT_RESERVE);
    nReserved += getMPbalance(address, propertyId, METADEX_RESERVE);
    nReserved += getMPbalance(address, propertyId, SELLOFFER_RESERVE);

    if (property.isDivisible()) {
        balanceObjRet.push_back(Pair("balance", FormatDivisibleMP(nAvailable)));
        balanceObjRet.push_back(Pair("reserved", FormatDivisibleMP(nReserved)));
    } else {
        balanceObjRet.push_back(Pair("balance", FormatIndivisibleMP(nAvailable)));
        balanceObjRet.push_back(Pair("reserved", FormatIndivisibleMP(nReserved)));
    }

    return true;
}
zathras-crypto commented 9 years ago

Can you run each test two times right after the other and post your results?

I actually ran the test a number of times mate, results always similar - takes about half the time on subsequent runs due to caching - always sub 1s now with the fix though.

First parameter is the number, not the propertyId, so

Surprised that compiled OK since nAvailable is not a bool - hmm - anyway good spot & fixed - thanks mate :)

1) Pass the CMPSPInfo::Entry struct:

If we do that, we no longer need to supply propertyId

2) Or fetch the Entry right within that function and return feedback

This is where the problem is. Something like getallbalances would call BalanceToJSON n times (n being the number of addresses that hold the token, on order of thousands). If we fetch the property within the BalanceToJSON function we fetch it n times instead of just once (where the apparent 'freezing' issue comes into play) hence why I require us to now pass in a fixed divisibility bool to force the calling function to specify divisibility and in turn thus control how often it's checked.

Thanks Z

dexX7 commented 9 years ago

Surprised that compiled OK since nAvailable is not a bool ...

IIRC any positive number would implicitly be converted to true. Likewise you can do conditional checks of non-bool values, e.g. if (pwallet).

If we do that, we no longer need to supply propertyId

Unfortunally not, because the property id is not part of the struct.

This is where the problem is.

My bad, you're right.

I actually ran the test a number of times mate, results always similar.

Not that I doubt you, but I'm surprised, given how fast LevelDB is in general.

zathras-crypto commented 9 years ago

Not that I doubt you, but I'm surprised, given how fast LevelDB is in general.

Surprised me too, especially when we consider that we're using fill_cache as a read option for levelDB - should mean executing the same query repeatedly should be almost instant as we should be hitting cache. isPropertyDivisible (the function responsible for the extra load) also performs a JSON string conversion to object so I wonder if perhaps that's the bottleneck rather than levelDB itself.

I'll tinker more when i have more time, but at least fixed and merged for now :)

Thanks Z

dexX7 commented 9 years ago

Yeah, would need some profiling or testing to pin down the exact cause.

I was by the way able to confirm the very long delay (~5 min) and the speedup with this patch (less than a second).