mraggi / discreture

A modern C++ library for efficiently and easily iterating through common combinatorial objects, such as combinations, permutations, partitions and more.
MIT License
67 stars 7 forks source link

combination in parallel with std for_each #12

Open remz1337 opened 3 years ago

remz1337 commented 3 years ago

Hello,

I have to iterate over 5 billions+ combinations so I would like to do it in parallel using the STL for_each loop with parallel execution, but I get an error saying Parallel algorithms require forward iterators or stronger

auto comb = discreture::combinations(_population, 2);

std::for_each(std::execution::par, comb.begin(), comb.end(), [&](auto i) {
    DoStuff();
});

Anyway I could make it work with STL for_each?

Thanks

mraggi commented 3 years ago

Interesting. On my machine, using clang++ 11.0.1 and boost 1.75, it just works. The iterators are supposed to be random access. Which compiler and, more importantly, which version of boost are you using?

remz1337 commented 3 years ago

I'm on Windows. Using boost 1.75 and the default clang that comes with Visual Studio 2019 (v12 I believe).

I also just tried your parallel example using threads. It compiles and run, but still single threaded... I can open up a different issue for that if you want.

mraggi commented 3 years ago

Huh. I don't understand. My iterators inherit from boost::iterator_facade<*, *, boost::random_access_traversal_tag> which is supposed to be random access, and therefore forward iterator.

And I don't see why the example runs with only one thread. Can you post the output of the parallel benchmarks? You'll have to compile with -DBUILD_BENCHMARKS=true and then run the program called benchmark_parallel, and compare the speed against benchmark_discreture.

On my machine, parallel_benchmark gives a big boost, allegedly because it's using more threads.

remz1337 commented 3 years ago

sorry about that, it does run in parallel. I thought it didn't because with a large dataset, the function to divide work (which is single threaded) ran for multiple hours.

so back to the original issue of using parallel for_each.

EDIT: to clarify, I was having issue with the divide_work_in_equal_parts function. It is indeed a single threaded function that is super slow when I run it in Debug mode (MSVC 2019). Switched to Release and now the divide_work function runs in a few seconds (vs multiple hours). It is a bit annoying to try to debug in Release mode, but I can live with that.

remz1337 commented 3 years ago
#include <iostream>
#include <execution>
#include <vector>
#include <discreture.hpp>

int main(int argc, char** argv){
    std::vector<int> v2{ 1,2,3,4,5 };
    auto comb = discreture::combinations(v2, 2);
    //for (auto&& comb : discreture::combinations(v2, 2)) {
    std::for_each(std::execution::par, comb.begin(), comb.end(), [&](auto i) {
        std::cout << i[0] << ", " << i[1] << '\n';
        });
    std::cout << "done" << std::endl;
    return 0;
}

this throws the error in my environment. Can you test on your side?

mraggi commented 3 years ago

Hi. Yes, it works on my side. Both with gcc and with clang.

Can you try this "minimal" example with no discreture dependency on your machine? That way we can see if it's a discreture problem or a boost/msvc problem.

#include <iostream>
#include <execution>
#include <vector>
#include <boost/iterator/iterator_facade.hpp>

class n_iterator
    : public boost::iterator_facade<n_iterator, const int&, boost::random_access_traversal_tag>
{
public:
    explicit n_iterator(int t = 0) : value_(t) {}

private:
    void increment() { ++value_; }

    void decrement() { --value_; }

    const int& dereference() const { return value_; }

    void advance(int n) { value_ += n; }

    bool equal(const n_iterator& other) const
    {
        return value_ == other.value_;
    }

    int distance_to(const n_iterator& other) const
    {
        return static_cast<int>(other.value_) - value_;
    }

private:
    int value_{0};

    friend class boost::iterator_core_access;
}; // end class iterator

int main()
{
    std::for_each(std::execution::par, n_iterator(0), n_iterator(10), [&](auto i) {
        std::cout << i << '\n';
        });
    std::cout << "done" << std::endl;
    return 0;
}
remz1337 commented 3 years ago

MSVC was complaining about missing default constructor so I added this line :

public:
    n_iterator(){}
    explicit n_iterator(int t = 0) : value_(t) {}

And it's working as expected (I see 0...9 in the output).

I think I might be able to work around this by following your parallel example (splitting the work in X threads with the divide_work function).

For reference, I had the exact same issue with the cppitertools library, and they were aware of it but didn't want to fix so I looked somewhere else. That's when I found this library.