martinmoene / ring-span-lite

ring-span lite - A C++yy-like ring_span type for C++98, C++11 and later in a single-file header-only library
Boost Software License 1.0
148 stars 12 forks source link

speed of copying with iterators #29

Open user706 opened 1 year ago

user706 commented 1 year ago

Is this not "slow(ish)", when using the standard copying mechanisms:

std::copy(rs.cbegin(), rs.cend(), out_it)

??

It'll call this unnecessary code on each iterator-dereference: https://github.com/martinmoene/ring-span-lite/blob/b8135cd85499a2d5482e048eb8fb6b354350d427/include/nonstd/ring_span.hpp#L968 which calls the slowish https://github.com/martinmoene/ring-span-lite/blob/b8135cd85499a2d5482e048eb8fb6b354350d427/include/nonstd/ring_span.hpp#L836 on each dereference!!!

Should ring and ring_span not have some facilities to speed up copying?

ring_span::copy(InputIt first, InputIt last, OutputIt d_first);              // copy from ring_span to something else
ring_span::copy(InputIt first, InputIt last, ring_span &, OutputIt d_first); // copy from ring_span to ring_span
// etc.
user706 commented 1 year ago

I simply mean: split the copying into max. 2 fast, simple linear fragments:

If  idx_begin <= idx_end
   [ 0,  1, 2, idx_begin, 4, 5, 6, 7, 8, idx_end, ... ]

To copy from cbegin to cend:
- copy from idx_begin to idx_end

( -> has 1 fragment)

If  idx_end < idx_begin
   [ 0,  1, 2, idx_end,                              idx_begin, 98, 99, 100]   idx_lin_end

To copy from cbegin to cend:
- copy from idx_begin to idx_lin_end (excl)
- copy from 0 to idx_end

( -> has 2 fragment2)

martinmoene commented 1 year ago

Thanks for bringing this to my attention. Will look into it, likely in the weekend.

user706 commented 1 year ago

By the way... I'm a bit intregued by the iterators.

At first I thought there is a bug here: https://github.com/martinmoene/ring-span-lite/blob/b8135cd85499a2d5482e048eb8fb6b354350d427/include/nonstd/ring_span.hpp#L982 ?

Basically: What if you increment beyond the unsigned max and get wrap around?

But that can never happen, right? (Because the iterator protocol says: iter somewhere in the range, or one beyond the range.)

And also interesting: a begin() iterator that you store to a variable

auto it = rs.begin();

"moves along" with push_backs to rs.

user706 commented 1 year ago

And here's some "fast" code for you to experiment with:

    template<typename OutputIt>
    OutputIt copy(const const_iterator &beg, const const_iterator &end, OutputIt d_first ) {
        assert(this == beg.m_rs);
        assert(this == end.m_rs);

        if (beg == end) { // necessary, since otherwise below, would result in b == e, which is not distinguishable from a completely full ring
            return d_first;
        }

        auto       b    = m_data + normalize_(m_front_idx + beg.m_idx);
        auto const e    = m_data + normalize_(m_front_idx + end.m_idx);

        if (e <= b) {
            auto const edge = m_data + m_capacity;
            d_first = std::copy(b, edge, d_first);
            b = m_data;
        }
        return std::copy(b, e, d_first);
    }

only 2 calls to normalize_

user706 commented 1 year ago

And this, when copying from ring_span into ring_span:

public: 
    template<typename TOther>
    typename ring_span<TOther>::iterator copy(const const_iterator &beg, const const_iterator &end, ring_span<TOther> &other, typename ring_span<TOther>::iterator d_first ) {
        assert(this == beg.m_rs);
        assert(this == end.m_rs);
        assert(&other == d_first.m_rs);
        assert(end - beg <= other.size());

        if (beg == end) {  // necessary, since otherwise below, would result in b == e, which is not distinguishable from a completely full ring
            return d_first;
        }

        auto       b    = m_data + normalize_(m_front_idx + beg.m_idx);
        auto const e    = m_data + normalize_(m_front_idx + end.m_idx);

        auto       b_o    = other.m_data + normalize_(other.m_front_idx + d_first.m_idx);

        if (e <= b) {
            auto const edge = m_data + m_capacity;
            copy_to_lim(b, edge, b_o, other);
            b = m_data;
        }
        copy_to_lim(b, e, b_o, other);

        return std::next(d_first, end - beg);
    }

private:
    template<typename TOther>
    void copy_to_lim(pointer b, const pointer lim, pointer &b_o, ring_span<TOther> &other) {
        auto const numTillLim    = lim - b;
        auto const numTillEdge_o = other.m_data + other.m_capacity - b_o;
        if (numTillLim > numTillEdge_o) {
            std::copy(b, b + numTillEdge_o, b_o);
            b += numTillEdge_o;
            b_o = other.m_data;
        }
        b_o = std::copy(b, lim, b_o);
    }

I just hope there's no bug, in there. haha

Quuxplusone commented 1 year ago

This sounds to me like a case of "piecewise contiguous iterators," which also comes up with std::deque. There's no Standard-blessed way to mark an iterator as "piecewise contiguous" (and no API for what to do next, even if it is). But I know @philnik777 has recently done some optimizations in libc++ for deque iterators; it might be good to take a look at those and see whether it's possible to adapt the shape of them into something ring_span could also either (1) duplicate, or (2) hook into (in a non-standard way, to make the optimization available to libc++'s std::copy).

FWIW, simply adding a ring_span::copy member function feels unsatisfactory to me. Is the problem that std::copy itself is slow? or is the problem that one of ring_span's "special skills" is that you can blat out all its data into a contiguous buffer really quickly, and yet we don't expose that skill in the public API? ring_span::copy solves the latter but not the former.

Consider whether the right API might be (extremely bikesheddable names):

if (!rs.is_contiguous()) {
    rs.rotate_until_contiguous();
}
std::copy(&*rs.begin(), &*rs.end(), out_it);

and whether "rotate" is even possible, if you have a custom popper that you don't know what it does. I haven't thought about this at all. Or alternatively, probably better:

std::copy_n(rs.firsthalfdata(), rs.firsthalfsize(), out_it);
std::copy_n(rs.secondhalfdata(), rs.secondhalfsize(), out_it);
philnik777 commented 1 year ago

FWIW I can't recommend playing around in implementation-details, but this indeed looks like it could be made a segmented iterator. If you really want to, you can look at __iterator/segmented_iterator.h and specialize __segmented_iterator_traits for your iterator.

user706 commented 1 year ago

Thanks for those pointers. Looks like I'll roll my own custom ring, to be honest.

Reasons:

a) I won't invest in segmented_iterator as I'm not on libc++, but on libstdc++.

b) Also, I think that the current ring here, does not fullfill all my criteria.

I want indices or iterators into the ring, that remain fixed... that don't "magically" shift with push_back.

Related: Personally I think that I don't agree with the design decision made here, of having iterators do magic on deferences (the obtained value is shifted along). For me *it should always point at the same element in the underlying array. That's safer, gives the opportinity to avoid locking, etc.

Look what happens in this code: https://github.com/martinmoene/ring-span-lite/blob/b8135cd85499a2d5482e048eb8fb6b354350d427/include/nonstd/ring_span.hpp#L966

You call at_ which calls m_data[ normalize_(m_front_idx + idx) on the dereference.

I would change that, to have normalize_ in the call-path of iterator-increments etc., and not in the dereference.

Why do I say all this? Well... Here's my usecase:

Image a relativly large ring (many items). I want a writer, and a delayed reader, who can at his leisure read blocks (copy them), (also hopefully without needing all that much locking).

Currently there's no proper index or iterator mechanism, to do this; as currently push_backs, would cause dereferences to shift along ("behind my back" -- I'm joking ... this is all design decision and usecase stuff.)

Yes, there is a potential problem with my approach, namely that too slow reads, will cause the writer to overwrite ranges that the slow reader has not yet read. That's particular to my usecase and I need to work with code, to come up with a good solution.

so far with ...my thoughts...

martinmoene commented 1 year ago

Thanks for providing insight into this background :)