lib: adjust small time values in _badrandom_from_time

cferris1000 commented 1 month ago

On some systems, the clock is reset, or is lost, so the value returned by time can be a very small value. In that case, the _badrandom_from_time function returns a large value close to the maximum uint32_t value. This can be a problem when used to create a sequence number and that number overflows the uint32_t maximum value when it is incremented.

In this case, detect when the time value is too small, and add a value to make sure the value returned is not too close to the uint32_t maximum value.

thom311 commented 1 month ago

This can be a problem when used to create a sequence number and that number overflows the uint32_t maximum value when it is incremented.

What is exactyly the problem with the wrap around of the sequence number?

_badrandom_from_time() can give any number. If some numbers are unsuitable, then the caller needs to handle that (e.g. by handling (avoiding?) wrap around).

cferris1000 commented 1 month ago

This happens because the s_seq_next value will be some really large value, and then in nl_socket_use_seq() the sk->s_seq_next++ will overflow on these devices that don't have a clock set to the current time and just start time at 0. In our code base, we compile the code such that it will abort on overflow, so these devices abort quickly.

I did have another version that avoids the overflow by modifying nl_socket_use_seq to check if the sequence number is at the maximum uint32_t value then set it to zero rather than overflow. The downside to the sequence overflow check is that you also need to modify lib/nl.c, the nl_complete_msg() function to call nl_socket_use_seq (or duplicate the logic there). I am probably overthinking this, but I was worried that it would be adding extra instructions or extra function calls to hot paths. I was also not sure if the code can handle the sequence number getting set back to zero.

Should I abandon this patch and do the overflow check patch I mentioned above? Is there another approach that would be better?

Thanks.

thom311 commented 1 month ago

In our code base, we compile the code such that it will abort on overflow, so these devices abort quickly.

That seems a questionable thing to do. It is understood that wrap around of unsigned integers is well defined. libnl relies on that. You can build your own code base with whatever flags work for you, but you are making assumptions (that wrap around of unsigned int would be always a bug), that libnl does not share.

Btw, your patch doesn't avoid the wrap around either. It just makes it only happen after a large number of messages, it still can happen. That's at least ugly, even if you don't ever expect to send 3 billion messages before the wrap around happens.

I was also not sure if the code can handle the sequence number getting set back to zero.

I think zero should be fine. Regardless, if zero were to be avoided, then the logic for that should be at s_seq_next and not in _badrandom_from_time().

I did have another version that avoids the overflow by modifying nl_socket_use_seq to check if the sequence number is at the maximum uint32_t value then set it to zero rather than overflow.

That seems better. I would take such a patch. Or maybe re-evaluate, whether rejecting wrap around of unsigned integers should really be treated as a bug.

The downside to the sequence overflow check is that you also need to modify lib/nl.c, the nl_complete_msg() function to call nl_socket_use_seq (or duplicate the logic there).

As you say, that is no problem. Don't let lib/nl.c directly access s_seq_next but let it call nl_socket_use_seq().

cferris1000 commented 1 month ago

Create a pull request of the sequence overflow protection here:

https://github.com/thom311/libnl/pull/395

thom311 commented 1 month ago

closing for 395

thom311 / libnl

lib: adjust small time values in _badrandom_from_time #392