tcbrindle / flux

A C++20 library for sequence-orientated programming
https://tristanbrindle.com/flux/
Boost Software License 1.0
429 stars 28 forks source link

Add filter_map adaptor #191

Open Guekka opened 1 week ago

Guekka commented 1 week ago

Rust has it

It avoids code like this

        .map([&files_in_directory](const btu::Path &path) -> std::optional<btu::Path> {
            if (const auto it = files_in_directory.find(path); it != files_in_directory.end())
                return it->second;
            return {};
        })
        .filter([](const auto &path) { return path.has_value(); })
        .map([](const auto &path) { return path.value(); })

Please correct me if I missed an equivalent

tcbrindle commented 1 week ago

So this would be an adaptor that takes a function of the form (T) -> optional<U>, and then yields those elements for which the optional is engaged, right?

I definitely think that that would be a nice thing to have.

We could re-use the optional_like concept for testing the return type of the mapping function:

template <typename O>
concept optional_like =
    std::default_initializable<O> &&
    std::movable<O> &&
    requires (O& o) {
        { static_cast<bool>(o) };
        { *o } -> flux::detail::can_reference;
    };

This would allow not just flux::optional and std::optional to be used, but also (for better or worse) raw pointers and smart pointers.

We could also think about adding a convenience function which does filter_map(std::identity) for the case where the sequence elements are already optional-like -- something like filter_deref?

Guekka commented 1 week ago

So this would be an adaptor that takes a function of the form (T) -> optional, and then yields those elements for which the optional is engaged, right?

Indeed. Sorry for not describing it, I assumed it was known to you.

I've given some thought to your idea about optional_like. I don't like the idea of dereferencing pointers, as it could lead to dangerous behavior with smart pointers. Imagine this case (godbolt)

#include <vector>
#include <memory>
#include <iostream>

#include "https://raw.githubusercontent.com/tcbrindle/flux/main/single_include/flux.hpp"

// unique_ptr has compile error in map() and no time to investigate
// doesn't map() pass arguments as rvalue?
template <class T>
using Ptr = std::shared_ptr<T>;

int main() {
    std::vector<Ptr<int>> vec(100);

    flux::from(std::move(vec))
        .map([] (Ptr<int> ptr) { return *ptr; }) // oops
        // memory is freed...
        .for_each([](auto val) { std::cout << val; });
}

So I think we cannot reuse this concept. Maybe we could first look for .release() before dereferencing?

tcbrindle commented 1 week ago

So this would be an adaptor that takes a function of the form (T) -> optional, and then yields those elements for which the optional is engaged, right?

Indeed. Sorry for not describing it, I assumed it was known to you.

Just making sure we're on the same page

I've given some thought to your idea about optional_like. I don't like the idea of dereferencing pointers, as it could lead to dangerous behavior with smart pointers. Imagine this case (godbolt)

#include <vector>
#include <memory>
#include <iostream>

#include "https://raw.githubusercontent.com/tcbrindle/flux/main/single_include/flux.hpp"

// unique_ptr has compile error in map() and no time to investigate
// doesn't map() pass arguments as rvalue?
template <class T>
using Ptr = std::shared_ptr<T>;

int main() {
    std::vector<Ptr<int>> vec(100);

    flux::from(std::move(vec))
        .map([] (Ptr<int> ptr) { return *ptr; }) // oops
        // memory is freed...
        .for_each([](auto val) { std::cout << val; });
}

The problem here is not that memory is freed too early: the vector is still alive for the whole duration of the for_each call. The problem is that std::vector<Ptr<int>> vec(100) creates 100 nullptrs, which map then dereferences without doing any checking. In other words, this example isn't doing the "filter" part of filter_deref :)

What I had in mind was more like this example, which seems to work okay.

tcbrindle commented 1 week ago

(Also, the reason unique_ptr didn't work in your example is because the lambda passed to map was taking its argument by value, which was trying to copy a unique_ptr.

This is a point of difference between Flux and Rust iterators. In Rust, vec.iter() gives you an iterator over &T, whereas vec.into_iter() gives you an iterator over T. In Flux, flux::ref(vec) gives you a sequence of T const& as you would expect; but flux::from(std::move(vec)) gives you a sequence of T&, not plain T, because we allow multipass iteration so we don't want to "use up" vector elements.)

Guekka commented 1 week ago

The problem is that std::vector<Ptr<int> vec(100) creates 100 nullptrs

Oops haha. I expected it to crash and did not look further. Your second message explains my lack of understanding, I assumed the elements would be moved.

I haven't read the docs thoroughly, but I did not see this point at a first glance. Maybe it would be beneficial to add it somewhere in evidence? That is quite important in my opinion.

So, this is not the right place to ask that, but is there no way to consume the elements?


Anyway, with these info, I agree with you. We can reuse optional_like safely Should I submit a PR?

tcbrindle commented 6 days ago

The problem is that std::vector<Ptr<int> vec(100) creates 100 nullptrs

Oops haha. I expected it to crash and did not look further. Your second message explains my lack of understanding, I assumed the elements would be moved.

I haven't read the docs thoroughly, but I did not see this point at a first glance. Maybe it would be beneficial to add it somewhere in evidence? That is quite important in my opinion.

I don't think it's mentioned anywhere in the documentation at the moment. I was planning to write a "Flux for Rust users" document (along with "Flux for C++ Ranges users" and maybe others) but it's one of many things on the TODO list

So, this is not the right place to ask that, but is there no way to consume the elements?

There is no library-provided way to do this at the moment, although something like

auto as_rvalue = []<flux::adaptable_sequence Seq>(Seq&& seq)
{
    return flux::map(FLUX_FWD(seq), [](auto&& elem) -> flux::rvalue_element_t<Seq> {
        return static_cast<flux::rvalue_element_t<Seq>>(FLUX_FWD(elem));
    });
};

should work in 99% of cases. A proper solution would be an adaptor which replaces each call to read_at with a call to move_at. The reason we don't have that (yet) is because I need to carefully think through the safety implications -- specifically, should such an adaptor be single-pass only like in Rust (because we can only inspect each element once), or should we pass through the capabilities of the underlying sequence, like the map version above? I know @brevzin feels strongly that the latter option is the correct one, but I'd like to be sure.

Anyway, that's a bit off topic for this issue!

Anyway, with these info, I agree with you. We can reuse optional_like safely Should I submit a PR?

Please do, although I won't have a lot of time to work on Flux for the next month so apologies in advance if it takes a little while to review.