Open GoogleCodeExporter opened 8 years ago
On my MacOSx it is not 110 MB but 36 MB
Here's a breakup what takes how much size (bytes) on Mac OSX
scan 8396256
compact 175028
segscan 25003640
spmvmult 151168
radixsort 3735368
rand 155888
Original comment by shu...@gmail.com
on 26 Jun 2009 at 1:02
Thanks. How did you figure that out?
Original comment by harr...@gmail.com
on 26 Jun 2009 at 2:07
Added cu file to the Makefile in cudpp one at a time. That gave me a cumulative
sum - so subtracted between
two consecutive sizes of the library to get the data.
Original comment by shu...@gmail.com
on 26 Jun 2009 at 2:11
Original comment by harr...@gmail.com
on 29 Jun 2009 at 7:51
Just a comment to help users who have issues with this problem. If you need to
reduce the CUDPP library binary size, you can comment out generation of the
template
kernels you don't need. In any of the *_app.cu files there is a Dispatch
function
for the corresponding algorithm. To optimize performance we have to use a
large
switch/if-else to dispatch at run time the appropriate compile-time optimized
template kernel function. To reduce compiled object size and also compile
time, you
can simply comment out the switch options that you don't need.
For example, if you don't need segmented scan, comment out everything inside
cudppSegmentedScanDispatch():
http://code.google.com/p/cudpp/source/browse/tags/1.1/cudpp/src/app/segmented_sc
an_ap
p.cu#386
Then, if you only need forward exclusive integer +-scans, comment out
everything but
the lines that invoke that type of scan:
http://code.google.com/p/cudpp/source/browse/tags/1.1/cudpp/src/app/scan_app.cu#
446
Your compile time and file size will be greatly reduced.
Perhaps the solution to this problem is to make compilation configurable in
some way.
Original comment by harr...@gmail.com
on 10 Dec 2009 at 10:40
Original comment by harr...@gmail.com
on 6 Jul 2011 at 2:36
Ha - my library is 280MB! (linux 64)
Original comment by rak...@gmail.com
on 1 Oct 2011 at 8:42
Yes, as we add support for more datatypes, it naturally multiplies the binary
size. The only solutions are separate compilation and linkage and/or runtime
code generation. The former is not supported by CUDA yet (will be in the
future), and the latter is not easy in CUDA yet...
Original comment by harr...@gmail.com
on 1 Oct 2011 at 10:25
If I need only to use cudppCompact - I assume I also need to be keep
cudppScanDispatch the same? (ie. uncommented out) Because if I comment out the
*Dispatch() functions in everything but cudppCompact - the library does not
work (it compiles, but just doesn't work). Would I also need to keep
reduceDispatch ?
Original comment by rak...@gmail.com
on 12 Oct 2011 at 5:03
Not exactly. Compact only needs a specific type of scan -- I believe it does a
forward exclusive sum scan of unsigned integers. So if you comment the lines
inside cudppScanDispatch (and the functions it calls) for everything but
forward, exclusive, operator+, and CUDPP_UINT, then it should work and be a
much smaller library.
Original comment by harr...@gmail.com
on 12 Oct 2011 at 5:34
Original issue reported on code.google.com by
harr...@gmail.com
on 25 Jun 2009 at 12:15