Open tcbrindle opened 11 months ago
I'm currently working on the productsseq = [a, b, c, d]
. However, the majority of the code in products.hpp
is the same as that in cartesian_product.hpp
.
For example, the code in for_each_while_impl
is the same for both except that sizeof...(Bases)
is replaced with RepeatCount
.
How might be a good way to reduce code duplication here?
I was thinking of extracting common functionality to a shared base class (similar to how cartesian_product_traits_base
is used for both cartesian_product
and cartesian_product_with
).
Hi @isaacy2012,
I'm currently working on the products feature and I have a rough implementation that produces the correct output for the example
seq = [a, b, c, d]
. However, the majority of the code inproducts.hpp
is the same as that incartesian_product.hpp
.
Thanks a lot for working on this.
For example, the code in
for_each_while_impl
is the same for both except thatsizeof...(Bases)
is replaced withRepeatCount
.How might be a good way to reduce code duplication here?
I was thinking of extracting common functionality to a shared base class (similar to how
cartesian_product_traits_base
is used for bothcartesian_product
andcartesian_product_with
).
Yes, I think moving as much as possible up to a shared base class sounds like a good approach :)
cartesian_product_adaptor
is one of the most complex adaptors we have in the library so you've chosen a tricky place to start! But if you have a working implementation already then it sounds like you're on the right track. If I can help, or if you have any questions about how Flux works then please let me know. Good luck!
Tristan
Hi Tristan,
Thanks for the feedback, I've moved the majority of the code shared between cartesian_product
, cartesian_product_with
, and the new cartesian_power
(I feel it differentiates better with cartesian_product
than simply products
while keeping a standard prefix, though it does stray from the python.itertools naming β would be interested to hear your thoughts on the naming).
The code sharing between cartesian_product
and cartesian_power
is mostly enabled by a new get<I>()
function in cartesian_traits_base
that becomes std::get<I>(self.bases_)
for cartesian_product
and self.base_
for cartesian_power
, done through CRTP. This is the main change that facilitates effectively "repeating" the sequences.
However, because cartesian_product_with
is unique in that it does not expose move_at
, read_at_unchecked
, etc..., those functions can't be publicly visible in the cartesian_traits_base
struct. I've tried two approaches:
cartesian_traits_base
and use using
statements in cartesian_product_traits_base
and cartesian_power_traits_base
to expose them publicly See https://github.com/tcbrindle/flux/compare/main...isaacy2012:flux:feat/products
cartesian_default_read_traits_base
to expose those functions by having cartesian_product_traits_base
and cartesian_power_traits_base
inherit from it.See https://github.com/tcbrindle/flux/compare/main...isaacy2012:flux:feat/products-default-read-traits
I don't like 1. because of the repeated using
statements, but 2. is a bit confusing because of the inheritance chain:
For cartesian_product sequence traits
---
sequence_traits<cartesian_product_adaptor<...>> :
cartesian_product_without_traits_base<...> :
cartesian_default_read_traits_base<cartesian_product_traits_base<...>> : // inherits from the type parameter
cartesian_product_traits_base<...> :
cartesian_traits_base<...>
For cartesian_product_with sequence traits
---
cartesian_product_with_adaptor::flux_sequence_traits :
cartesian_product_with_traits_base<...> :
cartesian_product_traits_base<...>
cartesian_traits_base<...>
For cartesian_power sequence traits
---
sequence_traits<cartesian_power_adaptor<...>> :
cartesian_power_traits_base<...> :
cartesian_default_read_traits_base<cartesian_traits_base<...>> : // inherits from the type parameter
cartesian_traits_base<...>
What are your thoughts on those approaches, or what might be a better way of expressing the orthogonal functionality of product vs power
and _with vs [without]
(I imagine that power_with
is a natural next step/feature).
Many thanks, Isaac
Hi Tristan,
Thanks for the feedback, I've moved the majority of the code shared between
cartesian_product
,cartesian_product_with
, and the newcartesian_power
(I feel it differentiates better withcartesian_product
than simplyproducts
while keeping a standard prefix, though it does stray from the python.itertools naming β would be interested to hear your thoughts on the naming).
We already have flux::product
which multiplies together all the elements of a sequence, so I think having a function called products
that does something completely different is probably not a great idea! :). I quite like cartesian_power
as an alternative. I'll do some research into what other languages/libraries call it to see if there's a commonly used name, but we can use cartestian_power
as a placeholder for now.
The code sharing between
cartesian_product
andcartesian_power
is mostly enabled by a newget<I>()
function incartesian_traits_base
that becomesstd::get<I>(self.bases_)
forcartesian_product
andself.base_
forcartesian_power
, done through CRTP. This is the main change that facilitates effectively "repeating" the sequences.
This seems like a nice way of doing it π
However, because
cartesian_product_with
is unique in that it does not exposemove_at
,read_at_unchecked
, etc..., those functions can't be publicly visible in thecartesian_traits_base
struct. I've tried two approaches:
- Make them protected in
cartesian_traits_base
and useusing
statements incartesian_product_traits_base
andcartesian_power_traits_base
to expose them publiclySee main...isaacy2012:flux:feat/products
- Use another trait base
cartesian_default_read_traits_base
to expose those functions by havingcartesian_product_traits_base
andcartesian_power_traits_base
inherit from it.See main...isaacy2012:flux:feat/products-default-read-traits
I don't like 1. because of the repeated
using
statements, but 2. is a bit confusing because of the inheritance chain:For cartesian_product sequence traits --- sequence_traits<cartesian_product_adaptor<...>> : cartesian_product_without_traits_base<...> : cartesian_default_read_traits_base<cartesian_product_traits_base<...>> : // inherits from the type parameter cartesian_product_traits_base<...> : cartesian_traits_base<...>
For cartesian_product_with sequence traits --- cartesian_product_with_adaptor::flux_sequence_traits : cartesian_product_with_traits_base<...> : cartesian_product_traits_base<...> cartesian_traits_base<...>
For cartesian_power sequence traits --- sequence_traits<cartesian_power_adaptor<...>> : cartesian_power_traits_base<...> : cartesian_default_read_traits_base<cartesian_traits_base<...>> : // inherits from the type parameter cartesian_traits_base<...>
What are your thoughts on those approaches, or what might be a better way of expressing the orthogonal functionality of
product vs power
and_with vs [without]
(I imagine thatpower_with
is a natural next step/feature).
(FYI, I've been intending to rename cartesian_product_with
to cartesian_product_map
(to better describe what it does) and it's even listed that way in the documentation, but I hadn't got round to actually changing it in the code yet!)
But yeah, I can see the difficulty: we have four adaptor classes that do fairly similar things and we'd like to make sure we reuse code sensibly between the implementations.
Of the two approaches you show, I think I'd lean towards the second one -- like you, I'm not too keen on the using directives. But I wonder if we could simplify the inheritance hierarchy a little by getting rid of the cartesian_product_traits_base
and cartesian_power_traits_base
classes? So we'd have:
sequence_traits<cartesian_product_adaptor<...>>
: cartesian_read_traits<...> // "mixin" to add read_at(), read_at_unchecked() returning a tuple
: cartesian_traits_base<...> // provides inc(), is_last() etc etc
sequence_traits<cartesian_product_map_adaptor<...>>
: cartesian_map_traits<...> // "mixin" to add read_at() which invokes a function
cartesian_traits_base<...> // provides inc(), is_last() etc
sequence_traits<cartesian_power_adaptor<...>>
: cartesian_read_traits<...>
: cartesian_traits_base<...>
sequence_traits<cartesian_power_map_adaptor<...>>
: cartesian_map_traits<...>
: cartesian_traits_base<...>
This might involve a small amount of code duplication (e.g the size()
impl), but I think this is justifiable if it simplifies things?
But I have to stress that I haven't actually tried it so I might be completely wrong...!
Tristan
Something else that has just occurred to me: for the "power" versions, the cursor type could be std::array<cursor_t<Base>, N>
rather than tuple<cursor_t<Base>, cursor_t<Base>, ...>
... I don't know if this would make things easier or more difficult though!
~Hi @isaacy2012, after saying I hadn't tried it I thought I'd better have a go! This is a very rough prototype of what I had in mind: https://flux.godbolt.org/z/54Wrs3Wha~
~As you can see, there's a bit of duplication in the top-level flux_sequence_traits
classes, but apart from that I think this is quite nice.~
~What do you think?~
EDIT: Never mind, I came up with a better way of doing it -- see below
Hi again @isaacy2012, sorry for bombarding you with messages!
I've been thinking about this some more and I think I have a cleaner approach.
Rather than using different base classes to encode the different policies -- product/power and map/tuple -- we can instead use just a single traits class that contains everything, and uses a pair of enums to select which code paths to use. The various adaptor classes (four of them) then specify which policies they want for their sequence_traits. I've coded up a proof of concept here: https://flux.godbolt.org/z/8jexzcreb
Hi @tcbrindle
Something else that has just occurred to me: for the "power" versions, the cursor type could be
std::array<cursor_t<Base>, N>
rather thantuple<cursor_t<Base>, cursor_t<Base>, ...>
... I don't know if this would make things easier or more difficult though!
Yep, I think that's much better than the repeated tuple, though I've kept the repeated tuple type for the value_type
so that the type is the same as it is for product
β should that be changed to std::array
too?
Regarding the switching between product/power and map/tuple, I think I actually preferred your mixin idea rather than the if constexpr
in the base trait, since it delegates specifics to the derived trait/struct rather than the base trait having an awareness of all the various code paths. Also, since product_map
doesn't expose the move_at
, move_at_unchecked
etc. functions, would we use requires
/enable_if
to conditionally enable those functions based on the read_kind
?
Thanks!
Hi @tcbrindle
Something else that has just occurred to me: for the "power" versions, the cursor type could be
std::array<cursor_t<Base>, N>
rather thantuple<cursor_t<Base>, cursor_t<Base>, ...>
... I don't know if this would make things easier or more difficult though!Yep, I think that's much better than the repeated tuple, though I've kept the repeated tuple type for the
value_type
so that the type is the same as it is forproduct
β should that be changed tostd::array
too?
No, the element type (the thing returned by read_at
) is a tuple, so the value type has to be a tuple as well, because there needs to be a conversion relationship between them.
Regarding the switching between product/power and map/tuple, I think I actually preferred your mixin idea rather than the
if constexpr
in the base trait, since it delegates specifics to the derived trait/struct rather than the base trait having an awareness of all the various code paths.
Yeah, from a pure design perspective bundling everything into one big class that contains every code path is certainly not the recommended way! But practically speaking I think it would probably end up being easier to understand and maintain in this case than using several different base classes with CRTP. I could be wrong though :)
Also, since
product_map
doesn't expose themove_at
,move_at_unchecked
etc. functions, would we userequires
/enable_if
to conditionally enable those functions based on theread_kind
?
Yeah, the idea was to use a requires clause, e.g.
template <typename Self>
static constexpr auto move_at(Self& self, cursor_t<Self> const& cur) -> decltype(auto)
requires (ReadKind = read_kind::tuple)
{
return do_read(flux::move_at, self, cur);
}
We could actually do this for all the functions -- e.g. provide two versions of first()
with different CartKind
constraints. I only used if constexpr
in the prototype example because it's slightly less typing, and generally compiles faster than constrained overloads.
Hi @tcbrindle
Yeah, from a pure design perspective bundling everything into one big class that contains every code path is certainly not the recommended way! But practically speaking I think it would probably end up being easier to understand and maintain in this case than using several different base classes with CRTP. I could be wrong though :)
I see, thanks for the insight, yeah I think it is definitely easier to understand.
Yeah, the idea was to use a requires clause, e.g.
template <typename Self> static constexpr auto move_at(Self& self, cursor_t<Self> const& cur) -> decltype(auto) requires (ReadKind = read_kind::tuple) { return do_read(flux::move_at, self, cur); }
We could actually do this for all the functions -- e.g. provide two versions of
first()
with differentCartKind
constraints. I only usedif constexpr
in the prototype example because it's slightly less typing, and generally compiles faster than constrained overloads.
Great, I've done this, the only snag was that neither if constexpr
nor the requires
worked for defining the using cursor_type =
and value_type
, so I've used a struct that is partially specialised on the cartesian_kind
to provide those types. I've also added cartesian_power_with
(I assume the renaming to map
will be done separately?)
https://github.com/tcbrindle/flux/compare/main...isaacy2012:flux:feat/cartesian-product-and-with
I was also wondering for cartesian_power_with
, is there a better way to define regular_invocable_repeated
? I needed to define that concept to restrict the mapping function to the form fn(element_t<Seq>, /* repeated PowN times */)
I probably won't get around to implementing permutations
and the others yet, should I make a pull request?
Hi @tcbrindle
Yeah, from a pure design perspective bundling everything into one big class that contains every code path is certainly not the recommended way! But practically speaking I think it would probably end up being easier to understand and maintain in this case than using several different base classes with CRTP. I could be wrong though :)
I see, thanks for the insight, yeah I think it is definitely easier to understand.
Yeah, the idea was to use a requires clause, e.g.
template <typename Self> static constexpr auto move_at(Self& self, cursor_t<Self> const& cur) -> decltype(auto) requires (ReadKind = read_kind::tuple) { return do_read(flux::move_at, self, cur); }
We could actually do this for all the functions -- e.g. provide two versions of
first()
with differentCartKind
constraints. I only usedif constexpr
in the prototype example because it's slightly less typing, and generally compiles faster than constrained overloads.Great, I've done this, the only snag was that neither
if constexpr
nor therequires
worked for defining theusing cursor_type =
andvalue_type
, so I've used a struct that is partially specialised on thecartesian_kind
to provide those types.
So I think that actually we can get away with not defining a cursor_type
alias, and just use cursor_t<Self>
(rather than cursor_type<Self>
, sorry for my bad naming!) in the function definitions.
value_type
will be a bit trickier, because we'll need to have different definitions for the tuple
and map
versions... or in fact we'd rather the value_type
wasn't defined at all for the map
versions so we fall back to the default definition.
I've also added
cartesian_power_with
(I assume the renaming tomap
will be done separately?)
I'm quite happy for you to do the rename of cartesian_product_with
while we've "got the hood up"
main...isaacy2012:flux:feat/cartesian-product-and-with
I was also wondering for
cartesian_power_with
, is there a better way to defineregular_invocable_repeated
? I needed to define that concept to restrict the mapping function to the formfn(element_t<Seq>, /* repeated PowN times */)
We actually already use this concept in adjacent_map_adaptor
:
You could rename this to repeated_invocable
and re-use it in both places.
I probably won't get around to implementing
permutations
and the others yet, should I make a pull request?
Yeah, let's not worry about permutations
or combinations
yet, this is a really impressive bit of work as it is! I'm still not entirely sure about the name, but if you'd like to submit a PR then that would be great :)
Thanks!
So I think that actually we can get away with not defining a
cursor_type
alias, and just usecursor_t<Self>
(rather thancursor_type<Self>
, sorry for my bad naming!) in the function definitions.
Yep, I've got this to work :)
value_type
will be a bit trickier, because we'll need to have different definitions for thetuple
andmap
versions... or in fact we'd rather thevalue_type
wasn't defined at all for themap
versions so we fall back to the default definition.
I've kept value_type
but now restricted it so it's only explicitly defined when the ReadKind
is read_kind::tuple
. Unfortunately, I couldn't find a good way to conditionally expose the value_type
, so I've settled for this:
template <std::size_t Arity, cartesian_kind CartesianKind, read_kind ReadKind, typename... Bases>
struct cartesian_traits_base : cartesian_traits_base_impl<Arity, CartesianKind, ReadKind, Bases...> {
using impl = cartesian_traits_base_impl<Arity, CartesianKind, ReadKind, Bases...>;
};
template <std::size_t Arity, cartesian_kind CartesianKind, typename... Bases>
struct cartesian_traits_base<Arity, CartesianKind, read_kind::tuple, Bases...> : cartesian_traits_base_impl<Arity, CartesianKind, read_kind::tuple, Bases...> {
using impl = cartesian_traits_base_impl<Arity, CartesianKind, read_kind::tuple, Bases...>;
using value_type = typename impl::types::value_type;
};
Which means that the friend
declaration in the adaptor looks like this now:
friend flux_sequence_traits::impl;
It's a small change but I don't like having the impl
leak out into its use in the adaptor. Can you think of a better way to conditionally expose the value_type
?
You could rename this to
repeated_invocable
and re-use it in both places.
I moved it to flux/core/concepts.hpp
, which I've just realised is probably the wrong place, but I'm not quite sure where to put it since it's now used by both adjacent
and cartesian_power_with
. Possibly in a new file flux/op/detail/concepts.hpp
?
You could rename this to
repeated_invocable
and re-use it in both places.I moved it to
flux/core/concepts.hpp
, which I've just realised is probably the wrong place, but I'm not quite sure where to put it since it's now used by bothadjacent
andcartesian_power_with
. Possibly in a new fileflux/op/detail/concepts.hpp
?
There's flux/op/requirements.hpp
for concepts that are useful to re-use between adaptors but aren't "core", if you see what I mean
I'm interested in taking a stab at implementing a permutation operation, but before I seriously get working on it I did a bit of research and realized there are a few different ways of approaching the problem. Pythons itertools.permutations
operation seems to do the permutation in the same order every time regardless of whether or not the input range is sorted. C++s std::next_permutation
always does the next lexicographical permutation of an input range, which means the sequence of permutations produced by std::next_permutation
is dependent on the ordering of the input range.
C++ example on compiler explorer
Python example on compiler explorer
+------------+------------+
| Perm[1,2,3]: |
+------------+------------+
| C++ | Python |
+------------+------------+
| (1,2,3) | (1,2,3) |
| (1,3,2) | (1,3,2) |
| (2,1,3) | (2,1,3) |
| (2,3,1) | (2,3,1) |
| (3,1,2) | (3,1,2) |
| (3,2,1) | (3,2,1) |
+------------+------------+
+------------+------------+
| Perm[3,2,1]: |
+------------+------------+
| C++ | Python |
+------------+------------+
| (3,2,1) | (3,2,1) |
| (1,2,3) | (3,1,2) |
| (1,3,2) | (2,3,1) |
| (2,1,3) | (2,1,3) |
| (2,3,1) | (1,3,2) |
| (3,1,2) | (1,2,3) |
+------------+------------+
The links above have a simple example of producing all the permutations of two lists, [1, 2, 3]
, and [3, 2, 1]
and the table is shown in the code block. The C++ and Python solutions output the same order of permutations for [1, 2, 3]
, but they output a different ordering of permutations for [3, 2, 1]
. Both options are valid, but to me the C++ method makes more sense. I would imagine that in flux if I use the permutation operation I would get the same output that calling std::next_permutation
S! / (S - N)!
times would produce. What do other people think would be best?
The Rust itertools.permutations()
seems to produce the same output as Pythons itertools.permutation
, so maybe the Python/Rust method is actually a better way to go as opposed to the C++ method.
Hi @nwad123, great to hear you're interested in taking this on!
Regarding the two difference approaches, the STL's next_permutation
requires that the input sequence is bidirectional, and that its elements are comparable. The Python version (IIUC) doesn't require either of these constraints, so I think the Python approach is the better way to go for Flux.
We could in principle add the STL ordering as something like ordered_permutations
later if it turns out to be desirable.
We currently have
cartestian_product(seq1, seq2, ...)
which takes a pack of N sequences and yields N-tuples of all the possible combinations of elements from those sequences. This is equivalent to a set of N nestedfor
loops, and to one form of the Python itertoolsproduct
operation.It's pretty common however to want to use the same sequence repeatedly with
cartesian_product
. Itertools provides a second form ofproduct
for this, using arepeat
argument. It would be good if we provided an equivalent too.We'd need it to have a different name, and we'd probably want to take
N
as a template parameter to avoid having to allocate. So we could have something likeproducts<3>(seq)
, which would be equivalent tocartestian_product(seq, seq, seq)
except that we wouldn't have to store multiple references to/copies of the sequence. The size of the resulting sequence would besize(seq)^N
.Itertools also provides the
permutations
operation, which would also be nice to have. This yields (in lexicographical order) all the length-N permutations of the elements of the input sequence. This is a strict subset of the elements ofproducts<N>
, skipping those tuples which contain repeated elements from the underlying sequence. IfS
is the size of the original sequence, the size ofpermutations<N>
isS!/(S - N)!
.Finally, Itertools has two slightly different
combinations
operations. The first gives a strict subset ofpermutations
, skipping tuples in which the elements are not in "sorted" order (according to their position in the initial sequence), and has sizeS!/N!(S-N)!
The second, the slightly oddly named (to me)
combinations.with_replacement
. This does the same ascombinations
except that it additionally allows individual elements to be repeated -- meaning the resulting elements are a subset ofproduct<N>
, but notpermutations<N>
. The number of elements is(S+N-1)!/N!(S-1)!
.To make this a bit more concrete, imagine an input sequence
seq = [a, b, c, d]
. Then the various operations give:products<3>(seq)
: (4^3 = 64 elements)permutations<3>(seq)
: (4!/(4-3!) = 24 elements)combinations<3>(seq)
: (4!/3!*(4-3)! = 24/6 = 4 elements)combinations_with_replacement<3>(seq)
: ((4+3-1)!/3!(4-1)! = 6!/3!3! = 720/36 = 20 elements)