idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.6k stars 1k forks source link

Amends previous PRs #27538

Closed nmnobre closed 1 week ago

nmnobre commented 2 weeks ago

Hi,

I messed up in #27470 when I removed PerfGraph::_active and its associated methods. The PerfGraph should continue to work even when there's no live printing, because we might want to print a summary at the end. I broke that functionality.

1) (d1946fb) I restore the functionality above. 2) (313c0ac) I moved all the settings under CommonOutputAction.C (previously some were in MooseApp.C), simplified the logic in PerfGraph.C, and also chose to inactivate PerfGraph if both the final summary and live printing are off - I don't see a reason to keep it on. 3) (a6d60e3) Since we never use the functionality to resume PerfGraphLivePrint, if the user doesn't want live printing, I just immediately kill the thread.

moosebuild commented 2 weeks ago

Job Documentation on 3c9e7d0 wanted to post the following:

View the site here

This comment will be updated on new commits.

lindsayad commented 1 week ago

It might be easiest to review on github if we revert #27470 first

nmnobre commented 1 week ago

It might be easiest to review on github if we revert #27470 first

It's just 6a0a34f, #27470 is fine otherwise. But I'm happy to split if you'd prefer.

lindsayad commented 1 week ago

No, that's fine. With https://github.com/idaholab/moose/pull/27538/commits/d1946fbca7d1d9188e1b3ab45374e567b3655d0e as the revert commit, we should focus attention on the other commits in this PR

moosebuild commented 1 week ago

Job Modules parallel on 3c9e7d0 : invalidated by @lindsayad

nmnobre commented 1 week ago

To add a bit more context, the original code had these comments:

  /**
   * Turn on or off timing
   */
  void setActive(bool active) { _active = active; }

  /**
   * Turn on or off live printing (if timing is off then live printing will be off too)
   */
  void setLivePrintActive(bool active) { _live_print_active = active; }

So I tried to follow that logic: _active is like the master key for the whole timing functionality while _live_print_active was the key to pause/resume the live printing functionality.

Then that immediately means that if (!_active && !_live_print_active) is semantically equivalent to if (!_active) and the other changes follow naturally.

Another interpretation would be to say, okay, the comment is outdated and _active is simply a toggle for the graph summary, making the two toggles completely separate and disjoint... but that didn't seem to be the intention of the author because specifying no_timing disables both things, making it impossible to enable live printing while _active is false.

I hope that clarifies things a bit.

lindsayad commented 1 week ago

@loganharbour you good with things here?

lindsayad commented 1 week ago

I'd rather not invalidate the phase field failure since I linked to it from https://github.com/idaholab/moose/issues/27555

loganharbour commented 1 week ago

As long as disableLivePrint() is thread safe, 👍🏼 from me

lindsayad commented 1 week ago

It's no more or less thread safe than it was before. We call disableLivePrint() in quite a few places in MooseApp

lindsayad commented 1 week ago

Ok I tried combinations of enabled/disabled perf graph and enabled/disabled live-print and saw no TSAN errors. Moving forward with this. Failures are unrelated

GiudGiud commented 6 days ago

It's no more or less thread safe than it was before. We call disableLivePrint() in quite a few places in MooseApp

setLivePrintActive and disableLivePrint() were thread-safe with their atomics. setActive is not. I think we could leave a

mooseAssert(!libMesh::in_threads, "This should not be called in a threaded region");

in both disableLivePrint() and setActive() to be sure. I dont think the current code base uses them in threaded regions if you did not see any TSAN errors

nmnobre commented 6 days ago

setLivePrintActive and disableLivePrint() were thread-safe with their atomics. setActive the replacement for the first one is not. I think we could leave a

mooseAssert(!libMesh::in_threads, "This should not be called in a threaded region");

in both disableLivePrint() and setActive() to be sure. I dont think the current code base uses them in threaded regions if you did not see any TSAN errors

_live_print_active was atomic because it was read from PerfGraphLivePrint which executes on its own, separate thread. But that's no longer the case, both because no such variable exists and because no such read exists. As Alex said, the current code is no more or less thread-(un)safe than it was before.

GiudGiud commented 6 days ago

no such variable exists

Yes, but the remaining variables that do exist now are not thread-safe. _active still exists and can be modified with setActive _disable_live_print still exists and can be modified with disableLivePrint()

because no such read exists.

That's true. but if you add new code in an app, you start a simple threaded loop and you call either setActive or disableLivePrintActive you will get race conditions.

the current code is no more or less thread-(un)safe than it was before.

You're not wrong. But the old code had the intent of having a thread safe way to do things (with the atomic on _live_print_active), even though it wasn't enforcing it either. There are no new problems for sure.

nmnobre commented 6 days ago

That's true. but if you add new code in an app, you start a simple threaded loop and you call either setActive or disableLivePrintActive you will get race conditions.

setActive could be a problem (as it could've been before), but disableLivePrint can't, I don't think? In any case, I'm sure there's more MOOSE code where apps could mess things up in similar ways? But an extra check wouldn't hurt, you're quite right. :)