best practice for performance

yanggthomas commented 4 months ago

Hi, I am trying to verify the performance on an A100 PCIe version. However, I can't get the expected performance reported in your paper.

Currently I am getting 467 MLUPs for cavity example with 400^3 cubic and 86 MLUPs for non-sparse case for 2phase with 256^3.

Besides, in your paper, it is reported that "The performance of the NVIDIA A100 GPU reached over 900 MLUPS for single-phase flow and 500 for two-phase flow with surface tension." in conclusion section. However in table 1, the max MLUPs for 2-phase is 310. Is this typo or other reasons?

yanggthomas commented 4 months ago

I am using Taichi v1.7.1 and removed dynamic_index parameter in ti.init()

yjhp1016 commented 4 months ago

I think the performance depends on geometry (will affect data continuity in memory), configuration of your computer (software and hardware), your library versions etc. I think as these configurations are not the same, so we got different results.

On Wed, 12 Jun 2024 at 11:44, yanggthomas @.***> wrote:

Hi, I am trying to verify the performance on an A100 PCIe version. However, I can't get the expected performance reported in your paper.

Currently I am getting 467 MLUPs for cavity example with 400^3 cubic and 86 MLUPs for non-sparse case for 2phase with 256^3.

Besides, in your paper, it is reported that "The performance of the NVIDIA A100 GPU reached over 900 MLUPS for single-phase flow and 500 for two-phase flow with surface tension." in conclusion section. However in table 1, the max MLUPs for 2-phase is 310. Is this typo or other reasons?

— Reply to this email directly, view it on GitHub https://github.com/yjhp1016/taichi_LBM3D/issues/27, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJEDKQHCV5Q7JZJOQJSQEBLZHARA3AVCNFSM6AAAAABJGC2DWKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DQNBTG42TQMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yanggthomas commented 4 months ago

so what's your suggested best practice to achieve good performance?

I think the performance depends on geometry (will affect data continuity in memory), configuration of your computer (software and hardware), your library versions etc. I think as these configurations are not the same, so we got different results. … On Wed, 12 Jun 2024 at 11:44, yanggthomas @.> wrote: Hi, I am trying to verify the performance on an A100 PCIe version. However, I can't get the expected performance reported in your paper. Currently I am getting 467 MLUPs for cavity example with 400^3 cubic and 86 MLUPs for non-sparse case for 2phase with 256^3. Besides, in your paper, it is reported that "The performance of the NVIDIA A100 GPU reached over 900 MLUPS for single-phase flow and 500 for two-phase flow with surface tension." in conclusion section. However in table 1, the max MLUPs for 2-phase is 310. Is this typo or other reasons? — Reply to this email directly, view it on GitHub <#27>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJEDKQHCV5Q7JZJOQJSQEBLZHARA3AVCNFSM6AAAAABJGC2DWKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DQNBTG42TQMY . You are receiving this because you are subscribed to this thread.Message ID: @.>

yjhp1016 commented 4 months ago

Sorry I'm not the best person to answer this question I'm afraid. I'm not a computing export, but a researcher working on CFD algorithm subjects, more on the numerical methods side...

On Wed, 12 Jun 2024 at 14:17, yanggthomas @.***> wrote:

so what's your suggested best practice to achieve good performance?

I think the performance depends on geometry (will affect data continuity in memory), configuration of your computer (software and hardware), your library versions etc. I think as these configurations are not the same, so we got different results. … <#m-6550979183461067865> On Wed, 12 Jun 2024 at 11:44, yanggthomas @.> wrote: Hi, I am trying to verify the performance on an A100 PCIe version. However, I can't get the expected performance reported in your paper. Currently I am getting 467 MLUPs for cavity example with 400^3 cubic and 86 MLUPs for non-sparse case for 2phase with 256^3. Besides, in your paper, it is reported that "The performance of the NVIDIA A100 GPU reached over 900 MLUPS for single-phase flow and 500 for two-phase flow with surface tension." in conclusion section. However in table 1, the max MLUPs for 2-phase is 310. Is this typo or other reasons? — Reply to this email directly, view it on GitHub <#27 https://github.com/yjhp1016/taichi_LBM3D/issues/27>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJEDKQHCV5Q7JZJOQJSQEBLZHARA3AVCNFSM6AAAAABJGC2DWKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DQNBTG42TQMY https://github.com/notifications/unsubscribe-auth/AJEDKQHCV5Q7JZJOQJSQEBLZHARA3AVCNFSM6AAAAABJGC2DWKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DQNBTG42TQMY . You are receiving this because you are subscribed to this thread.Message ID: @.>

— Reply to this email directly, view it on GitHub https://github.com/yjhp1016/taichi_LBM3D/issues/27#issuecomment-2162987875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJEDKQCZ3PWK7XC3RJOMC73ZHBC7VAVCNFSM6AAAAABJGC2DWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSHE4DOOBXGU . You are receiving this because you commented.Message ID: @.***>

yjhp1016 / taichi_LBM3D

best practice for performance #27