Several cusp::blas calls cause unexpected results by converting the output to ScalarType

GoogleCodeExporter commented 9 years ago

A simple calls like blas::axpy(x,y,1) will not do y = x*1 as expected but will 
instead do y = int(x*1).

The same problem exists in axpby, axpbypcz, and used to exist in scal (issue 
#65).

Instead of passing the value_type of the array to the functor, as I did before 
for scal, I think it might be better to simply not have ScalarType as a 
template parameter and use typename Array::value_type instead.

Original issue reported on code.google.com by filipe.c...@gmail.com on 18 May 2011 at 6:35

Attachments:

blas.patch

GoogleCodeExporter commented 9 years ago

Thanks for the report Filipe.

Using Array::value_type is better than the status quo, but it still isn't 
perfect. Consider the case where Array1::value_type is float in blas::axpy, but 
Array2::value_type is complex<float>.

I can think of two possible solutions
1) Create a structure upcast<T1,T2> (generalizes to upcast<T1,T2,T3> etc.) 
where upcast<T1,T2>::type handles the precedence rules for all the numerical 
types.  For example upcast<float,double>::type is double and upcast<float, 
complex<float>>::type is complex<float>.  In this case we could use 
upcast<Array1::value_type, Array2::value_type>::type in place of ScalarType 
*OR* retain ScalarType and use
upcast<Array1::value_type, Array2::value_type, ScalarType>::type as the return 
value of the functor.

Alternatively, we could remove the need to deduce the result of the operation 
entirely.  Instead, if we implemented all blas functions using thrust::for_each 
(instead of thrust::transform) and zipped up the arguments with a zip_iterator, 
then we could just write the expression to compute and assign it to the output. 
 This is almost how AXPBYPCZ is implemented now.  

template <typename ScalarType>
    struct AXPY
    {
        ScalarType alpha;

        AXPY(ScalarType _alpha) : alpha(_alpha) {}

        template <typename Tuple>
        __host__ __device__
            T operator()(Tuple& t)
            {
                thrust::get<1>(t) = alpha * thrust::get<0>(t) + thrust::get<1>(t);
            }
    };

I think the second approach is better and easier, so it's probably the way to 
go.

[1] http://code.google.com/p/cusp-library/source/browse/cusp/detail/blas.inl#166

Original comment by wnbell on 18 May 2011 at 4:54

GoogleCodeExporter commented 9 years ago

Here's a patch implementing the second alternative.
Unfortunately it doesn't apply cleanly to the head because of your recent 
changes, but if you're ok with it I can commit it.

Original comment by filipe.c...@gmail.com on 19 May 2011 at 12:35

Attachments:

blas2.patch

GoogleCodeExporter commented 9 years ago

Looks good to me!  Thanks Filipe!

Original comment by wnbell on 19 May 2011 at 1:14

Added labels: Milestone-Release0.2

GoogleCodeExporter commented 9 years ago

This issue was closed by revision 689dd92dd66e.

Original comment by filipe.c...@gmail.com on 19 May 2011 at 2:53

Changed state: Fixed

vkirangoud / cusp-library

Several cusp::blas calls cause unexpected results by converting the output to ScalarType #71