hypre-space / hypre

Parallel solvers for sparse linear systems featuring multigrid methods.
https://www.llnl.gov/casc/hypre/
Other
651 stars 182 forks source link

Lassen regression test update #1094

Closed liruipeng closed 2 months ago

liruipeng commented 2 months ago

This PR updates the regression test on lassen because the default CUDA version has been upgraded.

ulrikeyang commented 2 months ago

There were several faster ones. Was this all the same problem?

From: Rui Peng Li @.> Sent: Wednesday, May 1, 2024 9:45 AM To: hypre-space/hypre @.> Cc: Yang, Ulrike Meier @.>; Review requested @.> Subject: Re: [hypre-space/hypre] Lassen regression test update (PR #1094)

@liruipeng commented on this pull request.


In src/test/TEST_bench/benchmark_spgemm.perf.saved.lassenhttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/pull/1094*discussion_r1586497405__;Iw!!G2kpM7uM-TzIFchu!1EoZFo2YEOEXHjBWO4-QqAJy8GwBFNh2wpz9XDjCKtbZVeGIbu0n80rWgQ58C8ztKqcpYcWdTdKh6bJOs4EXO_xU5w$:

@@ -23,23 +23,23 @@ Device Parcsr Matrix-by-Matrix wall clock time = 0.008427 seconds

Output file: benchmark_spgemm.out.12

Device Parcsr Matrix-by-Matrix wall clock time = 0.011918 seconds

Output file: benchmark_spgemm.out.13

-Device Parcsr Matrix-by-Matrix wall clock time = 0.122758 seconds

+Device Parcsr Matrix-by-Matrix wall clock time = 0.027048 seconds

Oh. This is because the problem size is smaller.

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/pull/1094*discussion_r1586497405__;Iw!!G2kpM7uM-TzIFchu!1EoZFo2YEOEXHjBWO4-QqAJy8GwBFNh2wpz9XDjCKtbZVeGIbu0n80rWgQ58C8ztKqcpYcWdTdKh6bJOs4EXO_xU5w$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLJP42L2RRO4OZDNISLZAELXDAVCNFSM6AAAAABGXJ5G4CVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMZTHE2DSNZXGM__;!!G2kpM7uM-TzIFchu!1EoZFo2YEOEXHjBWO4-QqAJy8GwBFNh2wpz9XDjCKtbZVeGIbu0n80rWgQ58C8ztKqcpYcWdTdKh6bJOs4EQNld8WQ$. You are receiving this because your review was requested.Message ID: @.**@.>>

liruipeng commented 2 months ago

There were several faster ones. Was this all the same problem?

Sorry I replied too fast. SpGEMM from cusparse version 11 is significantly faster than CUDA 10, as shown in many cases. But it seems to be more memory demanding, so there is one case we used before that failed due to OOM error. I changed the problem size to a smaller one, and it shows a more faster speed obviously.