idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.67k stars 1.03k forks source link

Ensure PorousFlow tests work with multiple threads #14151

Closed WilkAndy closed 4 years ago

WilkAndy commented 4 years ago

Reason

PorousFlow should work with multiple threads

Design

I want this to work:

./run_tests --n-threads=2

Currently, most tests pass, but there are a few that report

A 'FluidProperties' object already exists with the name 'brine:water'.

At the moment i'm just planning on skipping these for the multi-threaded case. What do you think, @cpgr?

Impact

Robustness of PorousFlow

cpgr commented 4 years ago

So this will happen when using a BrineFluidProperties object https://github.com/idaholab/moose/blob/a29d9c25389c2a26c77c168693fdc184cf94b146/modules/fluid_properties/src/userobjects/BrineFluidProperties.C#L34

Maybe this should be done on thread 0 only?

cpgr commented 4 years ago

I meant to say that this adds a WaterFluidProperties user object with the above name during construction of the brine fluid properties user object. This bit might need to be wrapped in an if statement.

WilkAndy commented 4 years ago

Thanks @cpgr - your idea works,

a

cpgr commented 4 years ago

Thanks Andy!

WilkAndy commented 4 years ago

Hey @cpgr - the above PR is necessary for PorousFlow, but i have another problem. It isn't disasterous, but

./run_tests --re=dirackernels.theis3 --n-threads=2

sometimes fails. The CSVdiff is always due to the PorousFlowFluidMass Postprocessor. But there are a pile of other tests with that postprocessor and they never fail (so far, but i suspect i might have just been lucky). I've run out of energy. Don't spend heaps of time on this, but maybe you have an instant insight what's wrong with this test/Postprocessor/DiracKernel

a

cpgr commented 4 years ago

I'll have a look. I really hate things that only fail sometimes!

WilkAndy commented 4 years ago

Thanks Chris . Don't let it spoil your weekend. I bet this is a complicated issue.

WilkAndy commented 4 years ago

Putting these lines in the test file strongly suggests that the Variables are always correct, but that the Postprocessor is sometimes incorrect:

    csvdiff = 'theis3_line_0011.csv theis3.csv'
    rel_err = 1E-9
cpgr commented 4 years ago

Yeah, I came to the same conclusion last night, but have no idea why yet

WilkAndy commented 4 years ago

I ran each non-heavy test 100 times using

for x in $(seq 1 100); do ./run_tests --n-threads=2 -q &> tmp/${x}; done

I got 49 failures in total. Most of them were theis3.i and the remainder were bh07_exo.

cpgr commented 4 years ago

Definitely something funny happening with the PorousFlowFluidMass post processor.

I have simplified the input file considerably (included at the bottom) to have two 1D elements.

Running it with

../../../porous_flow-devel -i threadtest.i --n-threads=2

goes wrong about 30% of the time.

When it is correct, the output looks like (columns are total mass, mass in first element, mass in second element, and nonlinear variables at the three nodes)

Postprocessor Values:
+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+
| time           | massgas        | massgas0       | massgas1       | pp0            | pp1            | pp2            | sgas0          | sgas1          | sgas2          |
+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+
|   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |
|   1.000000e+01 |   5.000000e+00 |   3.729902e+00 |   1.270098e+00 |   3.345530e+07 |   2.401537e+07 |   2.000000e+07 |   5.994919e-02 |   5.074643e-03 |   7.350304e-04 |
|   3.000000e+01 |   1.500000e+01 |   9.125613e+00 |   5.874387e+00 |   3.161336e+07 |   2.446144e+07 |   2.000000e+07 |   1.334527e-01 |   1.911415e-02 |   6.888660e-03 |
|   7.000000e+01 |   3.500000e+01 |   1.695141e+01 |   1.804859e+01 |   2.890835e+07 |   2.374857e+07 |   2.000000e+07 |   2.278935e-01 |   4.571418e-02 |   3.161119e-02 |
|   1.000000e+02 |   5.000000e+01 |   2.189884e+01 |   2.810116e+01 |   2.776307e+07 |   2.334449e+07 |   2.000000e+07 |   2.843686e-01 |   6.418267e-02 |   5.483312e-02 |
+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+

Sometimes though, the total mass isn't correct (it should be 1/2 the total time). It looks like

Postprocessor Values:
+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+
| time           | massgas        | massgas0       | massgas1       | pp0            | pp1            | pp2            | sgas0          | sgas1          | sgas2          |
+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+
|   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |   0.000000e+00 |
|   1.000000e+01 |   5.000000e+00 |   3.729902e+00 |   1.270098e+00 |   3.345530e+07 |   2.401537e+07 |   2.000000e+07 |   5.994919e-02 |   5.074643e-03 |   7.350304e-04 |
|   3.000000e+01 |   1.500000e+01 |   9.125613e+00 |   5.874387e+00 |   3.161336e+07 |   2.446144e+07 |   2.000000e+07 |   1.334527e-01 |   1.911415e-02 |   6.888660e-03 |
|   7.000000e+01 |   3.500000e+01 |   1.695141e+01 |   1.804859e+01 |   2.890835e+07 |   2.374857e+07 |   2.000000e+07 |   2.278935e-01 |   4.571418e-02 |   3.161119e-02 |
|   1.000000e+02 |   5.038410e+01 |   2.189884e+01 |   2.848527e+01 |   2.776307e+07 |   2.334449e+07 |   2.000000e+07 |   2.843686e-01 |   6.418267e-02 |   5.483312e-02 |
+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+

In this case, the mass at the last time step is incorrect. The mass in the first element and all of the nodal variables are the same as before, but the mass in the second element is different.

[Mesh]
  type = GeneratedMesh
  dim = 1
  nx = 2
  xmax = 10
[]

[MeshModifiers]
  [./twoblocks]
    type = SubdomainBoundingBox
    bottom_left = '5 0 0'
    top_right = '10 0 0'
    block_id = 1
  [../]
[]

[Problem]
  type = FEProblem
  coord_type = RZ
  rz_coord_axis = Y
[]

[GlobalParams]
  PorousFlowDictator = dictator
  gravity = '0 0 0'
[]

[Variables]
  [./ppwater]
    initial_condition = 20e6
  [../]
  [./sgas]
    initial_condition = 0
    scaling = 1e3
  [../]
[]

[AuxVariables]
  [./massfrac_ph0_sp0]
    initial_condition = 1
  [../]
  [./massfrac_ph1_sp0]
    initial_condition = 0
  [../]
[]

[Kernels]
  [./mass0]
    type = PorousFlowMassTimeDerivative
    fluid_component = 0
    variable = ppwater
  [../]
  [./flux0]
    type = PorousFlowAdvectiveFlux
    fluid_component = 0
    variable = ppwater
  [../]
  [./mass1]
    type = PorousFlowMassTimeDerivative
    fluid_component = 1
    variable = sgas
  [../]
  [./flux1]
    type = PorousFlowAdvectiveFlux
    fluid_component = 1
    variable = sgas
  [../]
[]

[UserObjects]
  [./dictator]
    type = PorousFlowDictator
    porous_flow_vars = 'ppwater sgas'
    number_fluid_phases = 2
    number_fluid_components = 2
  [../]
  [./pc]
    type = PorousFlowCapillaryPressureConst
    pc = 1e5
  [../]
[]

[Modules]
  [./FluidProperties]
    [./simple_fluid0]
      type = SimpleFluidProperties
      bulk_modulus = 2e9
      density0 = 1000
      viscosity = 1e-3
      thermal_expansion = 0
    [../]
    [./simple_fluid1]
      type = SimpleFluidProperties
      bulk_modulus = 2e9
      density0 = 10
      viscosity = 1e-4
      thermal_expansion = 0
    [../]
  [../]
[]

[Materials]
  [./temperature]
    type = PorousFlowTemperature
  [../]
  [./ppss]
    type = PorousFlow2PhasePS
    phase0_porepressure = ppwater
    phase1_saturation = sgas
    capillary_pressure = pc
  [../]
  [./massfrac]
    type = PorousFlowMassFraction
    mass_fraction_vars = 'massfrac_ph0_sp0 massfrac_ph1_sp0'
  [../]
  [./simple_fluid0]
    type = PorousFlowSingleComponentFluid
    fp = simple_fluid0
    phase = 0
    compute_enthalpy = false
    compute_internal_energy = false
  [../]
  [./simple_fluid1]
    type = PorousFlowSingleComponentFluid
    fp = simple_fluid1
    phase = 1
    compute_enthalpy = false
    compute_internal_energy = false
  [../]
  [./porosity]
    type = PorousFlowPorosityConst
    porosity = 0.2
  [../]
  [./permeability]
    type = PorousFlowPermeabilityConst
    permeability = '1e-12 0 0 0 1e-12 0 0 0 1e-12'
  [../]
  [./relperm_water]
    type = PorousFlowRelativePermeabilityCorey
    n = 1
    phase = 0
  [../]
  [./relperm_gas]
    type = PorousFlowRelativePermeabilityCorey
    n = 1
    phase = 1
  [../]
[]

[BCs]
  [./rightwater]
    type = DirichletBC
    boundary = right
    value = 20e6
    variable = ppwater
  [../]
[]

[DiracKernels]
  [./source]
    type = PorousFlowSquarePulsePointSource
    point = '0 0 0'
    mass_flux = 0.5
    variable = sgas
  [../]
[]

[Preconditioning]
  [./smp]
    type = SMP
    full = true
  [../]
[]

[Executioner]
  type = Transient
  solve_type = Newton
  end_time = 100
  [./TimeStepper]
    type = IterationAdaptiveDT
    dt = 10
    growth_factor = 2
  [../]
[]

[Postprocessors]
  [./massgas]
    type = PorousFlowFluidMass
    fluid_component = 1
    phase = 1
  [../]
  [./massgas0]
    type = PorousFlowFluidMass
    fluid_component = 1
    phase = 1
    block = 0
  [../]
  [./massgas1]
    type = PorousFlowFluidMass
    fluid_component = 1
    phase = 1
    block = 1
  [../]
  [./pp0]
    type = NodalVariableValue
    variable = ppwater
    nodeid = 0
  [../]
  [./pp1]
    type = NodalVariableValue
    variable = ppwater
    nodeid = 1
  [../]
  [./pp2]
    type = NodalVariableValue
    variable = ppwater
    nodeid = 2
  [../]
  [./sgas0]
    type = NodalVariableValue
    variable = sgas
    nodeid = 0
  [../]
  [./sgas1]
    type = NodalVariableValue
    variable = sgas
    nodeid = 1
  [../]
  [./sgas2]
    type = NodalVariableValue
    variable = sgas
    nodeid = 2
  [../]
[]

[Outputs]
  print_linear_residuals = false
  perf_graph = true
  [./csv]
    type = CSV
    execute_on = timestep_end
  [../]
[]
WilkAndy commented 4 years ago

OK, this is much cleaner. i'm currently trying to fix MOOSE on pearcey (weekend shutdown has screwed things up) so can't play. Unless you do it, i'll make more postprocessors to record the porosity, etc, at relevant points to see what the root cause is.

cpgr commented 4 years ago

Yeah, I'm doing it now but haven't found what the issue is yet

cpgr commented 4 years ago

So I think that all of the material properties (porosity, saturation and density) are correct all of the time.

The bit that is changing is nodal_volume, calculated as

    Real nodal_volume = 0.0;
    for (_qp = 0; _qp < _qrule->n_points(); ++_qp)
      nodal_volume += _JxW[_qp] * _coord[_qp] * test[node][_qp];

And the bit that is causing it to change is test which is

const VariableTestValue & test = _var->phi();

Printing out _coord and test we get things like this when it works

_coord[_qp] 6.63897
test[node][_qp] 0.211325
_coord[_qp] 24.777
test[node][_qp] 0.788675

_coord[_qp] 38.0549
test[node][_qp] 0.211325
_coord[_qp] 56.1929
test[node][_qp] 0.788675

which look correct, and the following when the post processor calculates an incorrect mass

_coord[_qp] 6.63897
test[node][_qp] 0.211325
_coord[_qp] 24.777
test[node][_qp] 0.788675

_coord[_qp] 38.0549
test[node][_qp] 0
_coord[_qp] 56.1929
test[node][_qp] 0.788675
_coord[_qp] 38.0549
test[node][_qp] 1
_coord[_qp] 56.1929
test[node][_qp] 0.211325

where the test function values are (0, 1) rather than the usual values.

WilkAndy commented 4 years ago

I'm so glad it's not the porosity!

In the above, you have two groups of "coord,test" in each grey box. Is the first one from the first element, or the first one from the first thread? i'm a bit confused because the first grey box has 8 lines, while the second has 12 lines. Also, the coords are not 0<=x<=10. What am i missing?

cpgr commented 4 years ago

I think the coord bit includes the radial bit as it is rotated.

I had just selected some of the printed text - I had everything printing out. Here is just the values in this loop

  for (unsigned node = 0; node < test.size(); ++node)
  {
    Real nodal_volume = 0.0;
    for (_qp = 0; _qp < _qrule->n_points(); ++_qp)
      nodal_volume += _JxW[_qp] * _coord[_qp] * test[node][_qp];
  }

The output when the mass is correctly reported

Current elem 0, thread 0
  node 0, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 6.63897
    test[node][_qp] 0.788675
  node 0, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 24.777
    test[node][_qp] 0.211325
  nodal_volume 26.1799
  node 1, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 6.63897
    test[node][_qp] 0.211325
  node 1, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 24.777
    test[node][_qp] 0.788675
  nodal_volume 52.3599
  Element mass 21.8988
Current elem 0, thread 0
  node 0, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 6.63897
    test[node][_qp] 0.788675
  node 0, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 24.777
    test[node][_qp] 0.211325
  nodal_volume 26.1799
  node 1, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 6.63897
    test[node][_qp] 0.211325
  node 1, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 24.777
    test[node][_qp] 0.788675
  nodal_volume 52.3599
  Element mass 21.8988
Current elem 1, thread 1
  node 0, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 38.0549
    test[node][_qp] 0.788675
  node 0, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 56.1929
    test[node][_qp] 0.211325
  nodal_volume 104.72
  node 1, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 38.0549
    test[node][_qp] 0.211325
  node 1, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 56.1929
    test[node][_qp] 0.788675
  nodal_volume 130.9
  Element mass 28.1012
Current elem 1, thread 1
  node 0, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 38.0549
    test[node][_qp] 0.788675
  node 0, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 56.1929
    test[node][_qp] 0.211325
  nodal_volume 104.72
  node 1, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 38.0549
    test[node][_qp] 0.211325
  node 1, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 56.1929
    test[node][_qp] 0.788675
  nodal_volume 130.9
  Element mass 28.1012

and when it incorrectly calculates mass

Current elem 1, thread 1
  node 0, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 38.0549
    test[node][_qp] 1
  node 0, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 56.1929
    test[node][_qp] 0.211325
  nodal_volume 124.825
  node 1, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 38.0549
    test[node][_qp] 0
  node 1, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 56.1929
    test[node][_qp] 0.788675
  nodal_volume 110.795
  Element mass 28.4853
Current elem 1, thread 1
  node 0, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 38.0549
    test[node][_qp] 1
  node 0, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 56.1929
    test[node][_qp] 0.211325
  nodal_volume 124.825
  node 1, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 38.0549
    test[node][_qp] 0
  node 1, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 56.1929
    test[node][_qp] 0.788675
  nodal_volume 110.795
  Element mass 28.4853
Current elem 0, thread 0
  node 0, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 6.63897
    test[node][_qp] 0.788675
  node 0, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 24.777
    test[node][_qp] 0.211325
  nodal_volume 26.1799
  node 1, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 6.63897
    test[node][_qp] 0.211325
  node 1, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 24.777
    test[node][_qp] 0.788675
  nodal_volume 52.3599
  Element mass 21.8988
Current elem 0, thread 0
  node 0, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 6.63897
    test[node][_qp] 0.788675
  node 0, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 24.777
    test[node][_qp] 0.211325
  nodal_volume 26.1799
  node 1, qp 0
    _JxW[_qp] 2.5
    _coord[_qp] 6.63897
    test[node][_qp] 0.211325
  node 1, qp 1
    _JxW[_qp] 2.5
    _coord[_qp] 24.777
    test[node][_qp] 0.788675
  nodal_volume 52.3599
  Element mass 21.8988
WilkAndy commented 4 years ago

It is strange that only this and bh07.i fails. i don't see anything super special about those tests.

WilkAndy commented 4 years ago

Are you going to continue working on this, or should we ask on moose-users?

cpgr commented 4 years ago

I don't even know where to start looking for the reason why only these don't work when similar tests do work. Hopefully someone will have a clue

WilkAndy commented 4 years ago

OK, i'll post something. Feel free to amend/clarify what i write!

WilkAndy commented 4 years ago

At the moment, my hunch is that DiracKernels are somehow mucking this up, but that doesn't explain why the remainder of the tests pass.

lindsayad commented 4 years ago

Must be a race condition. This might be something that we can run through helgrind. In order to avoid creating two forums of discussion in the future, you guys can ping the @idaholab/moose-team like so (although then some might complain that they're now all subscribed to this thread... 😄 )

WilkAndy commented 4 years ago

Thanks for that, @lindsayad . I love the name helgrind (https://inheritance.fandom.com/wiki/Helgrind) - Chris, let's enter through the gates of hell/death!

lindsayad commented 4 years ago

So it looks like your fix just didn't do anything with phi?

lindsayad commented 4 years ago

This touches on multiple issues. There are issues with adding objects that already exist which apparently was fixed by #14153 and then there are issues with this PorousFlowFluidMass postprocessor, which most certainly hasn't been fixed (verified through my own testing). So I'm reopening this.

lindsayad commented 4 years ago

@WilkAndy Can you specify what criterion you used in judging that #14153 fixed the adding object issue? Does ./run_tests --n-threads=2 run cleanly every time now?

WilkAndy commented 4 years ago

@lindsayad , you are correct that PR #14153 did NOT fix the "var" problem that you have fixed in #14161 . PR #14153 only fixed the original aim of this Issue, which was the "FluidProperties" problem.

WilkAndy commented 4 years ago

This is now fixed, thanks @lindsayad . See all the PRs cited above.

WilkAndy commented 4 years ago

This saga continues! Thanks to @friedmud , for pointing out that we have to addVariableDependency to the Postprocessors