Closed vdukhovni closed 4 years ago
I would like to use the Strict
and StrictData
pragmas instead of bang patterns.
I would like to use the
Strict
andStrictData
pragmas instead of bang patterns.
I am less familiar with their uses, especially in the context of tuples. But ultimately I don't really mind how this is done, so long as we can ensure that IP6 (a,b,c,d) is strict in a, b, c and d. As for IPv4, being a newtype directly around the underlying Word32
it is already strict, since it is just a Word32 at runtime.
I wrote a fast Data.ByteString.Builder for IPv6 that even outperforms (by a factor of 10) the C-library inet_ntop() on FreeBSD and Linux, despite doing memory allocation, while the C code runs with fixed buffers, and it benefits from strict(er) IPv6 constructors. It always direct access to the Word32
form of IPv4 without conversion to network byte order.
So the new stricter accessors are to help with faster serialization, producing fewer intermediate lazy thunks and less GC work.
It is ~16 times faster than using show
(or 11 times faster than a more optimized implementation of show
).
3985e6066e69d98f21d33e2427cd68c9e62cbc44 should bring strictness. Please check it out.
3985e60 should bring strictness. Please check it out.
I am not sure that the entire library should be strict, for example, the ShowS
instances probably need to be lazy. I also don't think that StrictData
covers tuple elements, which is the main thing I wanted to see strict. I'll take a closer look, but I expect this may not turn out to be either sufficiently fine-grained, or sufficiently comprehensive where needed.
3985e60 should bring strictness. Please check it out.
I am not sure that the entire library should be strict, for example, the
ShowS
instances probably need to be lazy. I also don't think thatStrictData
covers tuple elements, which is the main thing I wanted to see strict. I'll take a closer look, but I expect this may not turn out to be either sufficiently fine-grained, or sufficiently comprehensive where needed.
OK, looking more closely, it does look like much of the module could be strict, almost all the structures and functions work with small objects with no obvious opportunities for lazy evaluation. But it would take a bit of care to make sure that strictness will never be slower if any of the code.
That said, for IPv6, strictness in the tuple does not ensure strictness in its elements, so we'd still some BangPatterns, because the type is not a "data" with four fields that could be made strict, but is rather a simple type alias for a four-tuple, and StrictData does not as far as I can tell make 4-tuples strict.
So I'd probably still want to add some BangPatterns...
Now I think that most part of this PR should be merged. But I would like to discuss something before merging.
All green now from 7.8 through 8.8.2. All that remains is a decision on whether the entire library is Strict by default (your commit), or just my selective tweaks. Perhaps you're right that Strict by default is best, I am less sure, but don't see any obvious objections. Your call.
FWIW, as you know, Strict
isn't always better, my IPv6 Builder code does 100million IPv6 -> Builder conversions (streaming them all to stdout) in 6.144 seconds with laziness on by default and selected strictness annotations, ... but takes 7.462 seconds to do the same thing when strict by default.
But that is perhaps because ByteString Builders rely on laziness for much of their efficiency, the functions in "iproute" other than the Show
instances, may work as well or slightly better strict
I don't have a good feel for which is likely better as yet, just know that StrictData
is typically a win, and so wanted to make the IPv6 4-tuple more strict, while at the same using somewhat more direct expressions for the accessors that generate fewer intermediate lists, so less GC and also less CPU.
But that is perhaps because ByteString Builders rely on laziness for much of their efficiency, the functions in "iproute" other than the Show instances, may work as well or slightly better strict
If we need lazyness in Strict
or StrictData
, we can put ~
(like !
).
@vdukhovni Do you want to keep the four tupple or don't mind if I will change it to data IPv6Addr
with StrictData
?
I wrote some tips for Strict
in Japanese. I hope that google translation will works well for you:
https://kazu-yamamoto.hatenablog.jp/entry/20151117/1447726679
Also, my position for lazyness is here: https://kazu-yamamoto.hatenablog.jp/entry/2019/02/15/115630
@vdukhovni Do you want to keep the four tupple or don't mind if I will change it to
data IPv6Addr
withStrictData
?
Keep the 4-tuple.
I wrote some tips for
Strict
in Japanese. I hope that google translation will works well for you: https://kazu-yamamoto.hatenablog.jp/entry/20151117/1447726679Also, my position for lazyness is here: https://kazu-yamamoto.hatenablog.jp/entry/2019/02/15/115630
I've come to increasingly appreciate laziness as time goes by. It can in fact yield significant performance advantages, but yes it takes some experience to understand when to disable it.
I do like StrictData
, that's a good default. As for Strict
I am more hesitant to use it...
Anyway, my part of the PR is done, and it is now green from 7.8 through 8.8.2! So the ball is in your court...
What is the next step with this PR? Did I neglect to do something?
No. I'm just busy for preparing 17th QUIC interop. I will come back this issue in the next week.
@vdukhovni Sorry for the delay. I have merged this PR. Do you want to get a new version released?
@vdukhovni Sorry for the delay. I have merged this PR. Do you want to get a new version released?
Thanks! As for a new version, I'm not in a hurry. You could wait for any other worthwhile changes to show up, or just release after a while at a time that's convenient for you.
FWIW, I have a performance improved Show
instance for IPv6 that runs 1.55 times faster, but I don't think that justifies the increased code complexity. The real payoff with that algorithm is that it supports a ByteString Builder
for IPv6 and IPv4, and the ByteString builder is 13.5 times faster than using Show
.
If you wanted the Builder contributed to this module, I could do that...
iproute
does not depends on bytestring
. However, since bytestring
is installed everywhere, it's OK for me to add bytestring
as dependency. Let's release a new version after we merge your builder.
So, I take it that you are interested in providing a Builder for IPv6/IPv4 addresses? If so, I'll submit a PR.
Just a "warning", in order to make the code fast, I'm using some low-level GHC-specific primitives, in particular, some of the computations are using primitive unlifted Word32#
and Int#
values, and associated primitive bit operators. The primops are pretty much confined to just the code that deals with computing the best gap...
It presently outperforms the C-library inet_ntoa() on Linux and FreeBSD by a factor of 10.
So, I take it that you are interested in providing a Builder for IPv6/IPv4 addresses?
Yup!
These are basically structure getters/setters and should be strict in their input and produce fully-evaluated output.
Also new {to,from}{IPv4,IPv6}w functions provide efficient inlined non-copying getters/setters for the (current) internal representations of the addresses, speeding up bulk processing. They're similar to the {to,from}HostAddress{,6} functions, but don't do byte order conversions and are strict.
Less terse documentation and a fix for a cut/paste error.
NOTE: The "to" functions now truncate each element to the expected number of bits. Previously, they'd just "+" un-truncated values leading to more surprising results when the input lists have elements with excess bits set. I don't expect this to be a compatibility issue, but given malformed input the behaviour is now different.