owensgroup / RXMesh

GPU-accelerated triangle mesh processing
BSD 2-Clause "Simplified" License
216 stars 29 forks source link

why i can not launch multithreads for Bilateral filtering? #1

Closed nashdingsheng closed 2 years ago

nashdingsheng commented 2 years ago

Dear author, recently i run the code of this project, but i tried many time that project Filtering can not launch multithread for the mesh bilateral denoising by openmesh , specifically, the function :filtering_openmesh which is called in filtering.cu file. in function body of filtering_openmesh , it is obviously that it uses openmp to run in multithread by :#pragma omp parallel for schedule(static) num_threads(num_omp_threads) \ reduction(max \ : max_neighbour_size).

but i attepted many times to observe 1.the time consumption in comparison with single thread by deleting #pragma。。。。。2.the cpu performance 3.the threads monitored in software "process explorer". all these phenomen indicates it runs in single thread. i do not know why it can not launch multitherad to do this job? and i am sure i have enabled openmp, macro OMP_NUM_THREADS and all possible configurations i know. In your paper, you compared openmesh implementing mcf and geodesic distance, i donot know how you switch between singlecore and multicore. Finally, my hardware running the project is briefly as below: Windows10, interl i7-10 with 16 threads, GPU2080S, cuda11.1, vs2019 and the project is the release version v0.1.0. hope to receive your answer, thank you.

Ahdhn commented 2 years ago

I have just noticed that on a Windows machine (with 64 threads) that running OpenMP (with omp_get_max_threads()) does not affect the timing that much compared with running using a single thread. On a Linux machine though (with 16 threads), multithreading makes a big difference—it is 5x faster. I am not sure why this is the case with Windows but I will try to look into this. In the paper, all performance numbers were reported on a Linux machine (DGX machine with Ubuntu 20.04 with gcc 9.3).

BTW, you can simply switch from a single thread to multi-threads by specifying the number of threads as the first argument here. Here, we call the function with maximum possible number of threads (i.e., omp_get_max_threads())) but you can change it to 1 thread (no need to delete the #pragma's)

nashdingsheng commented 2 years ago

thank you for your quick reply, i run on Linux machine with the same hardware configuration i mentioned above, multithreading exactly makes a big difference, about 5x faster. yesterday, i checked the assembly after “ #pragma omp parallel for schedule(static) num_threads(num_omp_threads)”. i find whether i keep or annotate(delete) this sentence “#pragma omp .....” . the assembly are the absolutely same indicating that the compiler does not compile the code to what we expect, as to why it happens, i will look into it as well. thank you.Merry Christmas.