lion03 / thrust

Automatically exported from code.google.com/p/thrust
Apache License 2.0
0 stars 0 forks source link

Improve performance of set ops #289

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
simpler code should be faster:

In thrust\thrust\detail\device\generic\scalar\binary_search.inl

template<typename RandomAccessIterator, typename T, typename BinaryPredicate>
__host__ __device__
  pair<RandomAccessIterator,RandomAccessIterator>
    equal_range(RandomAccessIterator first, RandomAccessIterator last,
                const T &val,
                BinaryPredicate comp)
{
  RandomAccessIterator lb = scalar::lower_bound(first, last, val, comp);
  return thrust::make_pair(lb, scalar::upper_bound(lb, last, val, comp));
}

reported by David Tarjan

Original issue reported on code.google.com by jaredhoberock on 13 Jan 2011 at 11:29

GoogleCodeExporter commented 8 years ago
2nd improvement:

In thrust\detail\device\cuda\block\set_intersection.inl:

RandomAccessIterator4 set_intersection(RandomAccessIterator1 first1,
                                         RandomAccessIterator1 last1,
                                         RandomAccessIterator2 first2,
                                         RandomAccessIterator2 last2,
                                         RandomAccessIterator3 temporary,
                                         RandomAccessIterator4 result,
                                         StrictWeakOrdering comp):

Replace:

    // count the number of previous occurrances of x the first range
    difference1 rank = x - thrust::detail::device::generic::scalar::lower_bound(first1,x,dereference(x),comp);

    // count the number of equivalent elements of x in the second range
    thrust::pair<RandomAccessIterator2,RandomAccessIterator2> matches = 
      thrust::detail::device::generic::scalar::equal_range(first2,last2,dereference(x),comp);

    difference2 num_matches = matches.second - matches.first;

    // the element needs output if its rank is less than the number of matches
    needs_output = rank < num_matches;

with:

        // count the number of previous occurrances of x the first range
    difference1 rank = x - thrust::detail::device::generic::scalar::lower_bound(first1,x,dereference(x),comp);

    // count the number of equivalent elements of x in the second range 
    // unpack equal_range because we need less information than it gives us
    RandomAccessIterator1 lb =  thrust::detail::device::generic::scalar::lower_bound(first2,last2,dereference(x),comp);
    // only need to know if (lb - ub) is larger than rank, not exactly how many elements of dereference(x) there are. This should often - 
    // if dereference(x) is not repeated and thus rank == 1 - gives us a much smaller search range      
    RandomAccessIterator1 ub =  thrust::detail::device::generic::scalar::upper_bound(lb,min(lb+rank+1,last2),dereference(x),comp);

    difference2 num_matches = ub - lb;

    // the element needs output if its rank is less than the number of matches
    needs_output = rank < num_matches;

saves you a binary search in the common case.

dt

Original comment by jaredhoberock on 15 Jan 2011 at 12:39

GoogleCodeExporter commented 8 years ago
This issue was closed by revision 0b8588e91d.

Original comment by jaredhoberock on 8 Feb 2011 at 5:04