kindelia / Kindelia

An efficient, secure cryptocomputer
https://kindelia.org/
603 stars 39 forks source link

refactor(err): remove unwrap from util.rs #260

Closed dan-da closed 1 year ago

dan-da commented 1 year ago

addresses #247. (more to come)

This PR focuses on kindelia_core/src/util.rs.

util.rs

runtime/mod.rs

note: regarding the change to get_time() and get_time_micro(). This change means that if a node's SystemTime is before Jan 1, 1970 00:00:00, then it will be interpreted as 1970-01-01 by a caller using the returned u128. I don't believe this should be a serious (security) issue because the p2p protocol should/must be verifying that blocks and messages between nodes are recent/current (or operating without time at all, eg using causal ordering, lamport timestamps, etc).

It would certainly be preferable to return a Result for these fn, but will require a lot of changes to node.rs and related code. Another option is to keep the panic() behavior in get_time() until the necessary node refactor can occur.

dan-da commented 1 year ago

Another approach would be to use the chrono crate, eg:

chrono::offset::Utc::now().timestamp_micros()

There is no need to unwrap here. However chrono timestamps use i64. This makes sense because time 0 is at the unix epoch and anything before is a negative number.

So now we get to the real heart of the matter: kindelia is representing timestamps as u128. So times before 1970 cannot be simply represented -- unless we do some tricky bitshifts or something.

Options would appear to be:

My preference would be to move to a signed integer as it seems cleanest for representing time losslessly. I believe this would be a change to the block structure and p2p protocol, so it's not something I will do without consensus.

note: i128 and u128 seem like overkill, given this comment for timestamp_micros() which returns an i64.

Note that this does reduce the number of years that can be represented from ~584 Billion to ~584 Thousand. (If this is a problem, please file an issue to let me know what domain needs microsecond precision over millennia, I’m curious.)

So we could save block/network space with i64.

racs4 commented 1 year ago

It would certainly be preferable to return a Result for these fn (get_time and get_time_micro)

In this specific case I think differently than you. I think a panic would be best for the get_time functions. These functions are used to return the current time, a date before 1970 should never be considered, and returning a Result for that would just take the trouble of always having to check elsewhere, to return this same error.

The Rust book says a panic should be returned when:

The bad state is something that is unexpected, as opposed to something that will likely happen occasionally.

Your code after this point needs to rely on not being in this bad state, rather than checking for the problem at every step.

I think this case intersects these two topics.

racs4 commented 1 year ago

i128 and u128 seem like overkill, given this comment for timestamp_micros() which returns an i64.

Yes, this is an internal discussion. I don't think signed numbers should be used, for the reason explained in the message above, that we don't expect to have dates before 1970 (and not even dates before the first day of the net launch), but I agree that 128 seems like overkill.

I think you could open a discussion in an issue on this topic, with these two time problems, regarding typing (size and sign) and what to do with unexpected dates (which I suggested to keep the behavior of returning panic in this PR).

dan-da commented 1 year ago

Our difference in philosophy perhaps come from my experience working on a team with a zero unwrap policy. It is stricter yes, but when everyone adheres to it for all situations, imho the overall code quality becomes higher, because all errors either get handled appropriately or propagated up to a place where they can be logged in a standard way and/or cause the program to exit in a controlled way.

Calling unwrap() as the current code does causes the program to exit suddenly with no chance to log anything and has potential to leave data in an inconsistent state. Perhaps a user temporarily sets their system clock to 1949 for a minute and a week later is surprised to find their k-chan instance hasn't been running and with no clear indicator why. (btw, this also brings up the matter of proper logging/tracing, which I hope we can adopt instead of eprintln).

Here are some articles that discuss reasons not to panic. (one two three)

edit: it can be argued that a strict "no unwrap" policy correctly forces us here to consider that our represention of time as u128 is "wrong" as it cannot represent all possible states on a user's machne. So a correct fix then is to change the data representation.

That said, these are debatable philosophical differences and I am certainly willing to defer to the team's wishes here.

If I understand correctly, you prefer a panic to the change in this PR of rounding times before epoch to 0? (at least until such time as i64/u64/u128 is resolved)

Perhaps a simple way forward would be to keep the two existing get_time() functions but mark them deprecated, and add try_get_time() that returns Result. New code can use those and existing callers can be retrofitted if/when possible.

racs4 commented 1 year ago

I understand what you're talking about. This is my first year working with Rust, so I don't have that experience from past work with this subject like you do. As you say this would be "debatable philosophical differences", which would be better discussed in an issue about it.

Perhaps a simple way forward would be to keep the two existing get_time() functions but mark them deprecated, and add try_get_time() that returns Result. New code can use those and existing callers can be retrofitted if/when possible.

I believe this is the best thing for now so that I can approve and merge this PR.

dan-da commented 1 year ago

No worries. Within the wider rust community, I believe it is unusual to find crates that never unwrap/panic, so I definitely hold a minority view. That said, I consider kindelia to be similar to writing aircraft control software... it needs to be a step above the rest in terms of reliability because failures can have very serious impacts once people have their finances/businesses riding on it. So that's where I'm coming from.

anyway, I will make the suggested change.