tonyyxliu / CUHKSZ-CSC4005

Project Materials for CUHK(SZ) Course CSC4005: Parallel Programming
MIT License
79 stars 31 forks source link

Why SIMD for PartB is dimmer #38

Closed MARYscu closed 1 year ago

MARYscu commented 1 year ago

I wonder if is there any possible reason that made the SIMD for Part B much dimmer compared to other methods. Appreciate any help!

MeloYang05 commented 1 year ago

What is your current performance improvement with SIMD?

MARYscu commented 1 year ago

I ran the sequential on the cluster, but the Execution Time: 33811 milliseconds, I wonder if is there any reason that the code is far exceeds the baseline? Appreciate any help!

MeloYang05 commented 1 year ago

I ran the sequential on the cluster, but the Execution Time: 33811 milliseconds, I wonder if is there any reason that the code is far exceeds the baseline? Appreciate any help!

The performance of the provided sequential version is very poor, so you need to improve its code to meet the requirements

MARYscu commented 1 year ago

Oh, I see, so we do not only need to improve simd .... and so on, but also need to improve the sequential code?

MARYscu commented 1 year ago

And for the SIMD code problem, the execution time is close to the baseline, but the output is way dimmer compared to the output of other methods, I wonder what might cause that problem. Appreciate your help!

MeloYang05 commented 1 year ago

Oh, I see, so we do not only need to improve simd .... and so on, but also need to improve the sequential code?

Yes, you need to improve your sequential code

MeloYang05 commented 1 year ago

And for the SIMD code problem, the execution time is close to the baseline, but the output is way dimmer compared to the output of other methods, I wonder what might cause that problem. Appreciate your help!

I think there are two potential reasons.

  1. You lose some computations so that the image looks dimmer and the speed is much quicker

  2. Though you have done all necessary computations, but your computations results are not stored from SIMD register to memory correctly. Remember that each pixel is only 8bits (unsigned char), transform float32 to uint8 and stored in the memory is also very costly and tricky.