Faster iter - Githubissues

TyPR124 commented 5 years ago

This includes the changes in #12

As of now this is ~~not complete~~ pending comments/review but otherwise complete, however I wanted to make sure the changes so far make sense.

I added a count_u64 (for Ipv4AddrRange) and count_u128 (for Ipv6AddrRange) method which I'm using internally. count_u64 is intended to never overflow or panic since IPv4 is 32-bits wide. On the other hand, count_u128 may overflow or panic since u128::MAX - u128::MIN + 1 overflows u128. For this reason, I also added a method on Ipv6AddrRange named can_count_u128 to indicate if count_u128 would overflow.

In addition to what I have now, FusedIterator can be implemented for all three range types, as well as DoubleEndedIterator.

I thought about implementing ExactSizeIterator however this doesn't seem to be viable since the size may overflow usize (even for Ipv4AddrRange, size may overflow on 32-bit systems). If there is some kind of precedent in the language that indicates it's actually okay to implement ExactSizeIterator, then I can do so.

TyPR124 commented 5 years ago

Now includes DoubleEndedIterator and FusedIterator impls, and tests for DoubleEnded (not sure if Fused really can be tested - only requirement is that once None is returned once, the iterator will never again return Some)

I ended up making count_u64 and count_u128 both private. To me this is fine since count is the unified API across all three (and what would IpAddrRange use?), and these two are really just supporting methods.

I think this is ready to go at this point, pending any comments or feedback.

TyPR124 commented 5 years ago

Just discovered that Step has replace_one and sub_one, so I thought it made sense to implement those for IpStep and use those where appropriate.

krisprice commented 5 years ago

Hi Tyler, sorry I haven't had a chance to look at this yet. I'll try to get to it soon. I just merged in your two fixes and published that release.

krisprice commented 5 years ago

Hi Tyler, taking a look at this now. These all look great, thanks. Give me your thoughts:

IIRC count() consumes an iterator, does/should your change do the same?
Regarding the overflow of count(): how does Rust handle this, e.g. for u128? ISTR it's unspecified if greater than usize or something to that effect. Should we just copy that behavior and document it the same way? That way we'll be consistent if ever we can directly do ranges of IpAddr types. Also I struggle to think what people might usefully do with IPv6 ranges larger than usize (but I'm pretty unimaginative of late :-)).
FusedIterator: great that makes sense, does this also make sense for IpSubnets? If so would you mind? :-)
DoubleEndedIterator, replace_one() and sub_one(): all great, thanks.

TyPR124 commented 5 years ago

count, as well as min, max, and last all take ownership of the iterator, then consume and drop each element as well as the iterator itself. Since we take ownership (and don't return it), it's not necessary to update the iterator state in these methods. Tangentially, IpAddrRange and related types are all Copy, so the user retains their original copy regardless of what we do with our copy. We only need to update the state of the iterator if we have &mut self, which happens in next, nth, and their _back counterparts.
count will wrap (release) or panic (debug) if it is incremented beyond usize::MAX. Since the default implementation of count will just count up by one, overflowing will always cause one of these two outcomes. usize, AFAIK, can be up to 64 bits, but even if usize can be 128 bits, I believe my implementation will work as expected. If usize can be more than 128 bits, then my implementation is not sufficient (specifically in the most extreme case of 0 through u128::MAX), but I don't believe there is any current system with even just 128 bit usize.

So basically, if count is greater than usize::MAX, then we either wrap or panic, which is what my implementation does. That said, there is a bit of an inconsistency in that we rely on debug_assert to panic in debug, however debug assertions can be disabled in debug, or enabled in release. I hadn't fully considered that, so I might look at that a bit more. I also hadn't considered if usize is less than 32 bits, which I think causes inconsistency with my v4 count.

One thing I had considered was panicing unconditionally (debug and release) when count overflows. To be honest that's easier to implement, but I'm just not sure if it is the right thing to do. Personally, I'd prefer it always panic (since I'm either doing something wrong by having such a big range, or if I intend to have that large a range, the incorrect count would probably cause problems later on anyway), but that would be inconsistent with the default behavior. Any thoughts?

I struggle to think what people might usefully do with IPv6 ranges larger than usize

One might want a range representing a /48 assigned to them. Obviously trying to collect() such a range wouldn't work, but having a representation of the range can still be useful. Tangentially, having min and max now allows me to get the start and end of the range (without iterating over it), so I can treat it as a range object more easily (in a similar sense to how some firewalls have the concept of an IP range in case you need something more granular than a subnet to filter with).

With the above example in mind (a /48), one might also want to count the number of IPs in that range, which wouldn't work now due to the usize limitation. How would you feel about having a public count_u128 (or similarly named) method, since it would work for both v4 and v6 (except the one edge case which would still panic/wrap) and therefore would be consistent across the API?

I'll look at IpSubnets. I'm not familiar with this crate and have only looked at what I'm specifically using, so if there's anything else feel free to point it out.

krisprice commented 5 years ago

Hi Tyler, thanks for explaining that it makes sense to me now. I've sent you a collaborator invite. Would you mind logging the changes in the RELEASES file and incrementing the minor version in the cargo.toml and also the top line of lib.rs?

TyPR124 commented 5 years ago

Would you mind logging the changes in the RELEASES file and incrementing the minor version

I will do that sometime this weekend. I'm actually unsure what being a collaborator means. Should I make these changes through this PR, another PR, or some other method entirely?

TyPR124 commented 5 years ago

I changed count so that it will always be consistent with Rust's default count implementation as far as wrapping or panicing, regardless of debug_assertions.

Also added FusedIterator for IpSubnets and related.

If that all looks okay I'll merge this to master, then bump the version and add to RELEASES.

krisprice commented 5 years ago

:smiley: :thumbsup: Go ahead and merge and create the new 2.1.0 release.

krisprice / ipnet

Faster iter #13