rapidsai / cusignal

cuSignal - RAPIDS Signal Processing Library
Other
712 stars 130 forks source link

Performance variation among different runs #573

Open Soroushmehr1 opened 1 year ago

Soroushmehr1 commented 1 year ago

We ran cuSignal on a VM five times and observed a large variation in terms of running time among different runs for a couple of tests although we didn't change anything neither from our platform nor from the code. For instance, the runtimes of ISTFT with "1024-1000000.0-65536-float64" parameter from two runs were 362.2986us and 665.0717us. We observed some similar differences for a couple of other tests (e.g. ChannelizePoly CWT,...) as well. What could be the cause of such big variations?

awthomp commented 1 year ago

Hi @Soroushmehr1, thanks for using cuSignal.

A couple questions:

  1. What GPUs were you using?
  2. How were you timing your functions? Were you simply using the pytest benchmarks or doing your own performance measuring?
Soroushmehr1 commented 1 year ago

Hi Adam, Thank you for your reply. I am using NC H100 v4 with two GPUs and 640 GiB RAM. I am using the pytest benchmark for measuring the time. Please let me know if there are any questions. Best, Reza

From: Adam Thompson @.> Sent: Thursday, May 18, 2023 10:55 AM To: rapidsai/cusignal @.> Cc: Reza Soroushmehr @.>; Mention @.> Subject: Re: [rapidsai/cusignal] Performance variation among different runs (Issue #573)

Hi @Soroushmehr1https://github.com/Soroushmehr1, thanks for using cuSignal.

A couple questions:

  1. What GPUs were you using?
  2. How were you timing your functions? Were you simply using the pytest benchmarks or doing your own performance measuring?

- Reply to this email directly, view it on GitHubhttps://github.com/rapidsai/cusignal/issues/573#issuecomment-1553187169, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI47AHTR4Q7OJ6GLD7A7TL3XGYZ53ANCNFSM6AAAAAAYFVHXJI. You are receiving this because you were mentioned.Message ID: @.***>

awthomp commented 1 year ago

Hi Reza,

I haven't tested cuSignal on these H100 Azure instances, so I don't immediately know what's going on with the time deltas. One thing you could do is to take a look at a specific function and time it like so:

  1. Run cuSignal function
  2. Start timer
  3. Run cuSignal function in a for loop with N cycles
  4. Stop timer
  5. Examine time delta / N to get time per iteration

I believe our pytest benchmarks include first run (with CUDA warmup, memory caching, etc).

Soroushmehr1 commented 1 year ago

Hi Adam, Thank you for your reply and suggestion. Is there any randomness in the inputs or number of inputs fed to a function? Among five runs, we observed the gap in mostly two of them. I attached the spreadsheet and highlighted the ones with large variations. What could be the reason for these gaps? Best, Reza

From: Adam Thompson @.> Sent: Thursday, May 18, 2023 11:07 AM To: rapidsai/cusignal @.> Cc: Reza Soroushmehr @.>; Mention @.> Subject: Re: [rapidsai/cusignal] Performance variation among different runs (Issue #573)

Hi Reza,

I haven't tested cuSignal on these H100 Azure instances, so I don't immediately know what's going on with the time deltas. One thing you could do is to take a look at a specific function and time it like so:

  1. Run cuSignal function
  2. Start timer
  3. Run cuSignal function in a for loop with N cycles
  4. Stop timer
  5. Examine time delta / N to get time per iteration

I believe our pytest benchmarks include first run (with CUDA warmup, memory caching, etc).

- Reply to this email directly, view it on GitHubhttps://github.com/rapidsai/cusignal/issues/573#issuecomment-1553205409, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI47AHVIFI2XAHWYTXF5OHTXGY3KTANCNFSM6AAAAAAYFVHXJI. You are receiving this because you were mentioned.Message ID: @.***>