terrapower / armi

An open-source nuclear reactor analysis automation framework that helps design teams increase efficiency and quality
https://terrapower.github.io/armi/
Apache License 2.0
224 stars 87 forks source link

Does the test reactor need to be cached/pickled? #1636

Open opotowsky opened 8 months ago

opotowsky commented 8 months ago

loadTestReactor pickles the reactor. This caused a big bug in some downstream tests. Just wondering if this functionality is needed.

john-science commented 8 months ago

The reason for that is if you call loadTestReactor() and it has already been pickled, loading the test reactor is 20-50 times faster. It can really speed up unit tests.

What was the cause of the bug? What happened?

john-science commented 8 months ago

Sigh...

Arrielle has isolated the following bad situation downstream:

  1. Someone builds the test reactor, and pickles it
  2. The global nuclides get tweaked by some test
  3. The test reactor gets unpickled
  4. The test reactor is unprepared for the change in global nuclides
  5. problems occur

In the long term, the solution to this issue is definitely to make the nuclides not global. We need to glue them onto the Operator and the Reactor, probably.

ntouran commented 8 months ago

I agree that the caching is essential to run the tests in reasonable time. This is a legit issue for sure. Bringing nuclide data into the reactor would definitely work and is probably the best solution.

We could also cache a hash/checksum of the nuclide info and then check it upon unpack and invalidate/rebuild the cache if the hashes don't match.

Even shorter term though:

  1. The global nuclides get tweaked by some test

Sounds like a test bug! Sounds like that test should not re-pickle the reactor imho. If a set of tests needs a different fixture then it should be more isolated. In other words, it should untweak before being done.

jakehader commented 8 months ago

This might be a crazy idea, but what if we aimed to not "pickle" the reactor/operator state for testing but we targeted being able to fully define the "state" in our database so that we are very specific on what data we are holding on to. I feel like this could tend us in the direction where we have the ability to isolate interface testing for unit tests (mocking), and have more robust restart capabilities.

Saving state seems like a good idea for testing but I think we'd be able to isolate bugs/issues if we could use our common db tooling that we built to inspect the state

opotowsky commented 8 months ago

@ntouran, I have all the tests destroying the global nuclides after they are created (I've been rooting out these bugs over months as they appear). In this case, that did not solve the issue.

Downstream, the only reason we haven't hit this issue before is most tests use some custom settings to load the test reactor. This means most of these tests are NOT using the cached reactor. In ARMI, there would likely be a slowdown with the removal of pickling. But I do not believe we would see a slowdown in our plugins. I'm not necessarily advocating for the removal of the pickling. I just want to understand it and see if there's a better way to control it.

john-science commented 8 months ago

We could also cache a hash/checksum of the nuclide info and then check it upon unpack and invalidate/rebuild the cache if the hashes don't match.

I could implement this as a stop-gap, as soon as we are past the code freeze.

And that would buy us some time for the longer-term "global nuclides" work.

andfranklin commented 2 months ago

Just wanted to notify people that I'm working on a solution to this.

john-science commented 2 months ago

I expect this ticket to be closed when this one is: https://github.com/terrapower/armi/issues/473

andfranklin commented 2 months ago

I expect this ticket to be closed when this one is: #473

@john-science I disagree. That issue is orthogonal to this one.

I think @opotowsky's original question is great! I have a solution that makes all the spaghetti logic in loadTestReactor unnecessary. It will remove many downstream bugs, maintain the caching functionality, and make that function easier to maintain. It might even help solve #473.

If you want to close this issue when you close #473 with a stop-gap because you think you've solved all the world's problems that's fine by me 🤷‍♂️

andfranklin commented 2 months ago

I'd like to note that the global nuclides issue does not need to be solved before this one. They are orthogonal issues.

jakehader commented 2 months ago

@andfranklin - There was a bug due to global nuclides and pickling. If you have a solution to that fixes or can show improvements on the pickling / caching then let's do a pull request and code review. If it doesn't address the global nuclides, I'm sure that's fine, we could separate out load reactor improvements from global nuclides improvements.

you think you've solved all the world's problems that's fine by me 🤷‍♂️

I believe @john-science is trying to keep you informed about the related global nuclides (#473) issue in case you were not aware / familiar. @john-science - Would you prefer a performance improvement ticket to be separate from this issue and this issue only he related to the global nuclides bug?