gridapapps / GridapGeosciences.jl

Gridap drivers for geoscience applications
MIT License
35 stars 3 forks source link

Panel-wise cell numbering in cubed sphere discrete models #14

Closed amartinhuertas closed 3 years ago

amartinhuertas commented 3 years ago

Related to https://github.com/gridapapps/GridapGeosciences.jl/issues/13#issuecomment-907974537 and https://github.com/gridapapps/GridapGeosciences.jl/issues/13#issuecomment-907997231

@davelee2804 @cbladwell @santiagobadia

FYI, in this PR I am setting up the CubedSphereDiscreteModels such that we have the desired panel-wise, lexicographic within each panel, numbering of cells (see below).

original_ordering panel_wise_ordering

Along the way, I have discovered what I think it is a bug in Gridap.jl, and PRed there (https://github.com/gridap/Gridap.jl/pull/651). For your convenience, the changes in this PR are already available at DIV_support_for_fe_functions. Thus if you work with the latter Gridap.jl branch, you will get these changes as well.

For the records, I have measured the computational times of the assembly stage for the Darcy problem on the cubed sphere using either ordering (i did not measure, matrix-vector product, if you like, you can do it). Results below. There you will see that the impact of the ordering on the assembly stage is almost negligible. This can be because the data locality of the original ordering may not actually be that bad. In any case, I think it is reasonable to have the new ordering, to avoid extra noise in the performance discussions.

--------------------------------------------------------------------------------
Panel-wise ordering (RT0-DG0-DG0) Darcy assembly times for increasing resolution
convergence_study(solve_darcy,generate_n_values(2),0,2)
--------------------------------------------------------------------------------

  0.001046 seconds (6.97 k allocations: 978.000 KiB)
  0.001336 seconds (7.00 k allocations: 981.281 KiB)
  0.001768 seconds (7.05 k allocations: 985.875 KiB)
  0.002123 seconds (7.10 k allocations: 991.781 KiB)
  0.002576 seconds (7.17 k allocations: 999.000 KiB)
  0.003092 seconds (7.25 k allocations: 1007.562 KiB)
  0.003588 seconds (7.34 k allocations: 1017.406 KiB)
  0.004314 seconds (7.44 k allocations: 1.004 MiB)
  0.004689 seconds (7.56 k allocations: 1.017 MiB)
  0.005001 seconds (7.56 k allocations: 1.017 MiB)
  0.015065 seconds (9.36 k allocations: 1.209 MiB)
  0.029340 seconds (12.36 k allocations: 1.529 MiB)
  0.050505 seconds (16.56 k allocations: 1.978 MiB)
  0.077888 seconds (21.96 k allocations: 2.555 MiB)
  0.113428 seconds (28.56 k allocations: 3.260 MiB)
  0.167090 seconds (36.36 k allocations: 4.093 MiB, 3.79% gc time)
  0.213774 seconds (45.36 k allocations: 5.054 MiB)
  0.258090 seconds (55.56 k allocations: 6.144 MiB)
  0.324181 seconds (66.96 k allocations: 7.361 MiB)

--------------------------------------------------------------------------------
Original ordering (RT0-DG0-DG0) Darcy assembly times for increasing resolution
convergence_study(solve_darcy,generate_n_values(2),0,2)
--------------------------------------------------------------------------------

  0.001117 seconds (6.97 k allocations: 978.000 KiB)
  0.001423 seconds (7.00 k allocations: 981.281 KiB)
  0.001683 seconds (7.05 k allocations: 985.875 KiB)
  0.002108 seconds (7.10 k allocations: 991.781 KiB)
  0.002617 seconds (7.17 k allocations: 999.000 KiB)
  0.003098 seconds (7.25 k allocations: 1007.562 KiB)
  0.003635 seconds (7.34 k allocations: 1017.406 KiB)
  0.004277 seconds (7.44 k allocations: 1.004 MiB)
  0.004918 seconds (7.56 k allocations: 1.017 MiB)
  0.005118 seconds (7.56 k allocations: 1.017 MiB)
  0.014954 seconds (9.36 k allocations: 1.209 MiB)
  0.029398 seconds (12.36 k allocations: 1.529 MiB)
  0.049776 seconds (16.56 k allocations: 1.978 MiB)
  0.078637 seconds (21.96 k allocations: 2.555 MiB)
  0.113415 seconds (28.56 k allocations: 3.260 MiB)
  0.155582 seconds (36.36 k allocations: 4.093 MiB)
  0.201716 seconds (45.36 k allocations: 5.054 MiB)
  0.258447 seconds (55.56 k allocations: 6.144 MiB)
  0.318850 seconds (66.96 k allocations: 7.361 MiB)

--------------------------------------------------------------------------------
Panel-wise ordering (RT1-DG1-DG1) Darcy assembly times for increasing resolution
convergence_study(solve_darcy,generate_n_values(2,n_max=50),1,4)
--------------------------------------------------------------------------------

  0.001933 seconds (6.97 k allocations: 1.425 MiB)
  0.002808 seconds (7.00 k allocations: 1.428 MiB)
  0.004330 seconds (7.05 k allocations: 1.433 MiB)
  0.005353 seconds (7.11 k allocations: 1.439 MiB)
  0.007342 seconds (7.17 k allocations: 1.446 MiB)
  0.009535 seconds (7.25 k allocations: 1.454 MiB)
  0.012000 seconds (7.34 k allocations: 1.464 MiB)
  0.014205 seconds (7.44 k allocations: 1.475 MiB)
  0.016106 seconds (7.56 k allocations: 1.487 MiB)
  0.016380 seconds (7.56 k allocations: 1.487 MiB)
  0.058011 seconds (9.36 k allocations: 1.679 MiB)
  0.133204 seconds (12.36 k allocations: 2.000 MiB)
  0.227223 seconds (16.56 k allocations: 2.448 MiB)
  0.360686 seconds (21.96 k allocations: 3.025 MiB)

--------------------------------------------------------------------------------
Original ordering (RT1-DG1-DG1) Darcy assembly times for increasing resolution
convergence_study(solve_darcy,generate_n_values(2,n_max=50),1,4)
--------------------------------------------------------------------------------

  0.001673 seconds (6.97 k allocations: 1.425 MiB)
  0.002785 seconds (7.00 k allocations: 1.428 MiB)
  0.004034 seconds (7.05 k allocations: 1.433 MiB)
  0.005531 seconds (7.11 k allocations: 1.439 MiB)
  0.007269 seconds (7.17 k allocations: 1.446 MiB)
  0.009383 seconds (7.25 k allocations: 1.454 MiB)
  0.011820 seconds (7.34 k allocations: 1.464 MiB)
  0.014418 seconds (7.44 k allocations: 1.475 MiB)
  0.016869 seconds (7.56 k allocations: 1.487 MiB)
  0.016526 seconds (7.56 k allocations: 1.487 MiB)
  0.063735 seconds (9.36 k allocations: 1.679 MiB)
  0.139095 seconds (12.36 k allocations: 2.000 MiB)
  0.247494 seconds (16.56 k allocations: 2.448 MiB)
  0.382572 seconds (21.96 k allocations: 3.025 MiB)
davelee2804 commented 3 years ago

Hey @amartinhuertas , nice pictures :) Have you also looked at the times for the solve() step?

amartinhuertas commented 3 years ago

Hey @amartinhuertas , nice pictures :) Have you also looked at the times for the solve() step?

No. But I didnt because I dont expect to be any significant difference in the timings (even with a random ordering, neither of our orderings are random, though). Note that we are using a sparse direct solver. Sparse direct solvers compute a fill-in reducing ordering of the matrix, typically nested dissection (https://en.wikipedia.org/wiki/Nested_dissection) prior to factorization. Most fill-in reducing ordering generate a solution of this NP problem which does not depend on the original ordering. For an iterative solver, there could be more difference. Specially the matrix-vector products. You can evaluate that if you are interested.

PS: dont forget to get the latests commits that I pushed to Gridap.jl#DIV_support_for_fe_functions. To this end, you have to run update in the Julia's REPL package manager (working on your branch of GridapGeosiences).

davelee2804 commented 3 years ago

OK, apologies for sending you on a wild goose chase here, I wasn't aware of the fill-in ordering of a sparse direct solve (I have only really worked with Krylov solvers in the past).

Thanks for the reminder, yes, I'm aware I need to pull in your lastest DIV commits, I just haven't gotten round to it yet...

codecov-commenter commented 3 years ago

Codecov Report

Merging #14 (0b8d6a5) into master (f7aadf4) will increase coverage by 3.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #14      +/-   ##
==========================================
+ Coverage   43.35%   46.35%   +3.00%     
==========================================
  Files          10       10              
  Lines         286      302      +16     
==========================================
+ Hits          124      140      +16     
  Misses        162      162              
Impacted Files Coverage Δ
src/CubedSphereDiscreteModels.jl 99.00% <100.00%> (+0.18%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c60d1c8...0b8d6a5. Read the comment docs.

amartinhuertas commented 3 years ago

OK, apologies for sending you on a wild goose chase here, I wasn't aware of the fill-in ordering of a sparse direct solve (I have only really worked with Krylov solvers in the past).

I think the initial ordering was not that bad from a locality point of view, this might justify why there is almost no difference in the computational timings during assembly. If you take a random ordering, e.g., I think we should be able to appreciate more difference. In any case I found a bug along the way in Gridap.jl that I have fixed.

amartinhuertas commented 3 years ago

Also note that for 2D problems sparse direct solvers (on single core computers) are highly competitive. They might be a bit slower than an iterative solver, but in terms of robustness there is no color.

amartinhuertas commented 3 years ago

Tests passed at last! Accepting PR ...

amartinhuertas commented 3 years ago

Thanks for the reminder, yes, I'm aware I need to pull in your lastest DIV commits, I just haven't gotten round to it yet...

Just in case ... (sorry If I am repetitive) ... you have to (1) git pull from main in GridapGeosciences. (2) update Gridap.jl in package manager. This will pull the latest devs into the Gridap.jl repo that the package manager internally manages.