Open GoogleCodeExporter opened 9 years ago
PFAC_matchFromHostReduce() needs to free working space d_input_string,
d_matched_result and d_pos.
[code]
cudaMalloc((void **) &d_input_string, n_hat*sizeof(int) );
cudaMalloc((void **) &d_matched_result, input_size*sizeof(int) );
cudaMalloc((void **) &d_pos, input_size*sizeof(int) );
cudaMemcpy(d_input_string, h_input_string, input_size, cudaMemcpyHostToDevice);
same as PFAC_matchFromDeviceReduce()
cudaMemcpy(h_pos, d_pos, (*h_num_matched)*sizeof(int), cudaMemcpyDeviceToHost);
cudaMemcpy(h_match_result, d_match_result_zip, (*h_num_matched)*sizeof(int), cudaMemcpyDeviceToHost);
cudaFree(d_input_string);
cudaFree(d_matched_result);
cudaFree(d_pos);
[/code]
In my tests, cudaFree() needs 12ms for 100MB input string and 24ms for 200MB
input stream.
If you are not a beginner, then I will suggest PFAC_matchFromDeviceReduce().
Original comment by LungShen...@gmail.com
on 29 Apr 2011 at 2:18
Original issue reported on code.google.com by
hja...@ymail.com
on 29 Apr 2011 at 1:49