trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 565 forks source link

Muelu: singletons in uncoupled aggregation #11562

Closed maxfirmbach closed 1 year ago

maxfirmbach commented 1 year ago

Question

Hello everyone,

I'm currently trying to precondition a 3x3 block system stemming from fluid-structure-interaction with MueLu. Sadly I'm running into an issue during the uncoupled aggregation process of the third block. I always end up with 1-2 singletons (single node aggregates?), which causes the TentativePFactory to error out during the prolongation generation. Additionally, during some runs I end up with 1 singleton and in some others with 2 (seems to be not deterministic?). I'm currently also just running serial, so no distributed or shared memory involved.

I also tried to do the aggregation of the third block in a coupled way and this is somehow working (I have to admit I don't know the exact implementation differences). As I'm running all serial, I expected the uncoupled and coupled aggregation to produce rather similar results.

Here is the output of the uncoupled aggregation phase:

Detected 103 agglomerated Dirichlet nodes
Number of dropped entries in amalgamated matrix graph: 0/13159 (0.00%)
Algo Phase - (Dirichlet)
BuildAggregates (Phase - (Dirichlet))
  aggregated : 103 (phase), 103/1190 [8.66%] (total)
  remaining  : 1087
  aggregates : 0 (phase), 0 (total)
Algo Phase 1 (main)
BuildAggregates (Phase 1 (main))
  aggregated : 913 (phase), 1016/1190 [85.38%] (total)
  remaining  : 174
  aggregates : 77 (phase), 77 (total)
Algo Phase 2a (secondary)
BuildAggregates (Phase 2a (secondary))
  aggregated : 0 (phase), 1016/1190 [85.38%] (total)
  remaining  : 174
  aggregates : 0 (phase), 77 (total)
Algo Phase 2b (expansion)
BuildAggregates (Phase 2b (expansion))
  aggregated : 172 (phase), 1188/1190 [99.83%] (total)
  remaining  : 2
  aggregates : 0 (phase), 77 (total)
Algo Phase 3 (cleanup)
BuildAggregates (Phase 3 (cleanup))
  WARNING Rank 0 singletons :2 (phase)
  aggregated : 2 (phase), 1190/1190 [100.00%] (total)
  remaining  : 0
  aggregates : 2 (phase), 79 (total)

Actually in this run the amount of found Dirichlet nodes is correct. But as explained above, sometimes one more Dirichlet node is found and one less singleton ... which is weird in my eyes.

Maybe someone of you has an idea, why the uncoupled aggregation is causing these issues.

Best regards, Max

@muelu and might also be interesting for @mayrmt

github-actions[bot] commented 1 year ago

Automatic mention of the @trilinos/muelu team

github-actions[bot] commented 1 year ago

Automatic mention of the @trilinos/muelu team

github-actions[bot] commented 1 year ago

Automatic mention of the @trilinos/muelu team

GrahamBenHarper commented 1 year ago

@maxfirmbach, do you know where these singletons are located? Are they located on the fluid-structure interface?

maxfirmbach commented 1 year ago

@GrahamBenHarper I think so, yes. Is it a problem for MueLu if one of my sub-blocks contains nodes with a different number of Dofs? Let's say I have a fluid and most of the nodes have 3 Dofs (2 velocities + 1 pressure), but on the interface, due to some condensation etc. I might have nodes with only 2 Dofs (2 velocities). I guess this could make problems. How would I handle such a situation?

GrahamBenHarper commented 1 year ago

@maxfirmbach Yes, differing discretizations may slightly affect how aggregation works because parameters for aggregation are handled globally. In your case, the issue may be related to your maximum aggregate size. This is mostly speculation since I don't know exactly where the aggregates are located for your specific problem, and also I'm not a total expert at our aggregation, but here's what I understand. Let's say your max aggregate size is $m_a$.

In aggregation phase 1, MueLu will skip over all nodes that have more than $m_a$ neighbors because they cannot form a valid aggregate with all neighbors. This will most commonly occur on certain interfaces in multiphysics applications, where the connectivity may be dense on interfaces depending on the method. Especially in the case where Lagrange multipliers are placed on the interface. Do you have a diagram of what your discretization looks like along the interface including the fluid and solid domains?

In aggregation phase 2a, MueLu will go back over all nodes that are not aggregated and see if it can form reasonable size aggregates from them. This then looks at your minimum aggregate size. In your case, nothing happens here.

Finally, aggregation phase 2b will loop over all unaggregated nodes. If an unaggregated node has neighbor(s) that are aggregated, it will add the unaggregated node to a neighboring aggregate. If there are multiple potential aggregates to add it to, it will choose based on some score to keep aggregates nicer quality. It does this process in a loop twice because because our typical aggregate size is 3 in each direction. A lot of aggregation in this phase may be undesireable because it doesn't consider the max aggregate size here, so it's possible in some bad cases to generate very large aggregates.

Anyway, this is a long-winded response now. My best guess is that playing with your min/max aggregate sizes may help fix your issue, but it will be hard to get "perfect" behavior for applications like this.

maxfirmbach commented 1 year ago

@GrahamBenHarper Thanks for the very detailed explanation. Do you think it would make sense to "fake" the dofs per node to e.g. 1 and use that for aggregation? Tuning the aggregation parameters sadly didn't change much.

The simulation domain is just rectangular, so the interface is basically a straight line, so nothing special.

cgcgcg commented 1 year ago

I'm a bit confused by the observed randomness. I don't think there should be anything random in a serial run. (Single rank, serial node, right?) Is there any way that the input could be changing, even if it's ever so slightly? We have adjustable tolerance for the Dirichlet nodes ("aggregation: Dirichlet threshold"). Maybe play with that first to see if that makes the runs more reproducible?

GrahamBenHarper commented 1 year ago

Oh, @cgcgcg raises a good point. If you're solving this problem on a rectangular domain, I think you may have numeric zeros which are not symbolic zeros. I like his suggestion more :)

maxfirmbach commented 1 year ago

Interesting ... yes I'm single rank, serial node.

    <ParameterList name="myAggFact3">
      <Parameter name="factory" type="string" value="UncoupledAggregationFactory"/>
      <Parameter name="Graph" type="string" value="myCoalesceDropFact3"/>
      <Parameter name="aggregation: Dirichlet threshold" type="double" value="1e-8"/>
      <Parameter name="aggregation: min agg size" type="int" value="3"/>
      <Parameter name="aggregation: max selected neighbors" type="int" value="0"/>
      <Parameter name="aggregation: compute aggregate qualities" type="bool" value="true"/>
    </ParameterList>

My uncoupled aggregation factory looks like the following, does 1e-8 makes sense as Dirichlet threshold?

So these are the 3 possible outcomes, which occur ... Is there a chance of slightly changing input, I honestly have to say I don't know, I'm just giving in the operator and the nullspaces ... Could be that some very small entries change very slightly.

AmalagamationFactory::Build(): found fullblocksize=2 and stridedblocksize=2 from strided maps. offset=0
lightweight wrap = 1
algorithm = "classical" classical algorithm = "default": threshold = 0.00, blocksize = 2
Level::Set: Not storing "Filtering" generated by factory CoalesceDropFactory[22] on level 0, as it has not been requested and no keep flags were set for it
Detected 104 agglomerated Dirichlet nodes
Number of dropped entries in amalgamated matrix graph: 0/13160 (0.00%)
Algo Phase - (Dirichlet)
BuildAggregates (Phase - (Dirichlet))
  aggregated : 104 (phase), 104/1190 [8.74%] (total)
  remaining  : 1086
  aggregates : 0 (phase), 0 (total)
Algo Phase 1 (main)
BuildAggregates (Phase 1 (main))
  aggregated : 913 (phase), 1017/1190 [85.46%] (total)
  remaining  : 173
  aggregates : 77 (phase), 77 (total)
Algo Phase 2a (secondary)
BuildAggregates (Phase 2a (secondary))
  aggregated : 0 (phase), 1017/1190 [85.46%] (total)
  remaining  : 173
  aggregates : 0 (phase), 77 (total)
Algo Phase 2b (expansion)
BuildAggregates (Phase 2b (expansion))
  aggregated : 172 (phase), 1189/1190 [99.92%] (total)
  remaining  : 1
  aggregates : 0 (phase), 77 (total)
Algo Phase 3 (cleanup)
BuildAggregates (Phase 3 (cleanup))
  WARNING Rank 0 singletons :1 (phase)
  aggregated : 1 (phase), 1190/1190 [100.00%] (total)
  remaining  : 0
  aggregates : 1 (phase), 78 (total)
Build (MueLu::AggregateQualityEstimateFactory)
Build (MueLu::BlockedCoarseMapFactory)
Nullspace factory (MueLu::NullspaceFactory)
Use user-given nullspace Nullspace3: nullspace dimension=3 nullspace length=2376
domainGIDOffset: 426 block size: 3 stridedBlockId: -1
AggregateQuality: mode eigenvalue
AggregateQuality: Using 'forward' algorithm
11 out of 234 did not pass the quality measure. Worst aggregate had quality 225.52. Mean bad aggregate quality 171.67. Mean good aggregate quality 6.09.
AGG QUALITY HEADER     : | LEVEL |  TOTAL  |
AGG QUALITY PERCENTILES: | 0 | 234|
Column map is consistent with the row map, good.
AmalagamationFactory::Build(): found fullblocksize=2 and stridedblocksize=2 from strided maps. offset=0
lightweight wrap = 1
algorithm = "classical" classical algorithm = "default": threshold = 0.00, blocksize = 2
Level::Set: Not storing "Filtering" generated by factory CoalesceDropFactory[22] on level 0, as it has not been requested and no keep flags were set for it
Detected 103 agglomerated Dirichlet nodes
Number of dropped entries in amalgamated matrix graph: 0/13159 (0.00%)
Algo Phase - (Dirichlet)
BuildAggregates (Phase - (Dirichlet))
  aggregated : 103 (phase), 103/1190 [8.66%] (total)
  remaining  : 1087
  aggregates : 0 (phase), 0 (total)
Algo Phase 1 (main)
BuildAggregates (Phase 1 (main))
  aggregated : 913 (phase), 1016/1190 [85.38%] (total)
  remaining  : 174
  aggregates : 77 (phase), 77 (total)
Algo Phase 2a (secondary)
BuildAggregates (Phase 2a (secondary))
  aggregated : 0 (phase), 1016/1190 [85.38%] (total)
  remaining  : 174
  aggregates : 0 (phase), 77 (total)
Algo Phase 2b (expansion)
BuildAggregates (Phase 2b (expansion))
  aggregated : 172 (phase), 1188/1190 [99.83%] (total)
  remaining  : 2
  aggregates : 0 (phase), 77 (total)
Algo Phase 3 (cleanup)
BuildAggregates (Phase 3 (cleanup))
  WARNING Rank 0 singletons :2 (phase)
  aggregated : 2 (phase), 1190/1190 [100.00%] (total)
  remaining  : 0
  aggregates : 2 (phase), 79 (total)
Build (MueLu::AggregateQualityEstimateFactory)
Build (MueLu::BlockedCoarseMapFactory)
Nullspace factory (MueLu::NullspaceFactory)
Use user-given nullspace Nullspace3: nullspace dimension=3 nullspace length=2376
domainGIDOffset: 426 block size: 3 stridedBlockId: -1
AggregateQuality: mode eigenvalue
AggregateQuality: Using 'forward' algorithm
11 out of 237 did not pass the quality measure. Worst aggregate had quality 225.52. Mean bad aggregate quality 171.67. Mean good aggregate quality 6.01.
AGG QUALITY HEADER     : | LEVEL |  TOTAL  |
AGG QUALITY PERCENTILES: | 0 | 237|
Column map is consistent with the row map, good.
AmalagamationFactory::Build(): found fullblocksize=2 and stridedblocksize=2 from strided maps. offset=0
lightweight wrap = 1
algorithm = "classical" classical algorithm = "default": threshold = 0.00, blocksize = 2
Level::Set: Not storing "Filtering" generated by factory CoalesceDropFactory[22] on level 0, as it has not been requested and no keep flags were set for it
Detected 105 agglomerated Dirichlet nodes
Number of dropped entries in amalgamated matrix graph: 0/13161 (0.00%)
Algo Phase - (Dirichlet)
BuildAggregates (Phase - (Dirichlet))
  aggregated : 105 (phase), 105/1190 [8.82%] (total)
  remaining  : 1085
  aggregates : 0 (phase), 0 (total)
Algo Phase 1 (main)
BuildAggregates (Phase 1 (main))
  aggregated : 913 (phase), 1018/1190 [85.55%] (total)
  remaining  : 172
  aggregates : 77 (phase), 77 (total)
Algo Phase 2a (secondary)
BuildAggregates (Phase 2a (secondary))
  aggregated : 0 (phase), 1018/1190 [85.55%] (total)
  remaining  : 172
  aggregates : 0 (phase), 77 (total)
Algo Phase 2b (expansion)
BuildAggregates (Phase 2b (expansion))
  aggregated : 172 (phase), 1190/1190 [100.00%] (total)
  remaining  : 0
  aggregates : 0 (phase), 77 (total)
Algo Phase 3 (cleanup)
BuildAggregates (Phase 3 (cleanup))
  aggregated : 0 (phase), 1190/1190 [100.00%] (total)
  remaining  : 0
  aggregates : 0 (phase), 77 (total)
Build (MueLu::AggregateQualityEstimateFactory)
Build (MueLu::BlockedCoarseMapFactory)
Nullspace factory (MueLu::NullspaceFactory)
Use user-given nullspace Nullspace3: nullspace dimension=3 nullspace length=2376
domainGIDOffset: 426 block size: 3 stridedBlockId: -1
AggregateQuality: mode eigenvalue
AggregateQuality: Using 'forward' algorithm
11 out of 231 did not pass the quality measure. Worst aggregate had quality 225.52. Mean bad aggregate quality 171.67. Mean good aggregate quality 6.17.
AGG QUALITY HEADER     : | LEVEL |  TOTAL  |
AGG QUALITY PERCENTILES: | 0 | 231|
Column map is consistent with the row map, good.
cgcgcg commented 1 year ago

Could you share the matrix?

maxfirmbach commented 1 year ago

Sure, I put them in .csv and Matlabs .mat, or do you need a different format? A.zip

The matrix size fits to what MueLu writes to the screen:

Level 0
Setup blocked Gauss-Seidel Smoother (MueLu::BlockedGaussSeidelSmoother{type = blocked GaussSeidel})
Setup Smoother (MueLu::IfpackSmoother{type = Chebyshev})
Build (MueLu::SubBlockAFactory)
A(0,0) is a single block and has strided maps:
  range  map fixed block size = 2, strided block id = -1
  domain map fixed block size = 2, strided block id = -1
A(0,0) has 66x66 rows and columns.
chebyshev: max eigenvalue (calculated by Ifpack) = 1.44668
"Ifpack Chebyshev polynomial": {Initialized: true, Computed: true, degree: 3, lambdaMax: 1.44668, alpha: 20, lambdaMin: 0.072334}
Level::Set: Not storing "PostSmoother" generated by factory SmootherFactory[35] on level 0, as it has not been requested and no keep flags were set for it
Setup Smoother (MueLu::IfpackSmoother{type = ILU})
Build (MueLu::SubBlockAFactory)
A(1,1) is a single block and has strided maps:
  range  map fixed block size = 3, strided block id = -1
  domain map fixed block size = 3, strided block id = -1
A(1,1) has 3663x3663 rows and columns.
IFPACK ILU (fill=0, relax=0.000000, athr=0.000000, rthr=1.000000)
Level::Set: Not storing "PostSmoother" generated by factory SmootherFactory[40] on level 0, as it has not been requested and no keep flags were set for it
Setup Smoother (MueLu::IfpackSmoother{type = Chebyshev})
Build (MueLu::SubBlockAFactory)
A(2,2) is a single block and has strided maps:
  range  map fixed block size = 2, strided block id = -1
  domain map fixed block size = 2, strided block id = -1
A(2,2) has 2376x2376 rows and columns.
chebyshev: max eigenvalue (calculated by Ifpack) = 1.87055
"Ifpack Chebyshev polynomial": {Initialized: true, Computed: true, degree: 3, lambdaMax: 1.87055, alpha: 20, lambdaMin: 0.0935276}
Level::Set: Not storing "PostSmoother" generated by factory SmootherFactory[45] on level 0, as it has not been requested and no keep flags were set for it

Oh, I put the block operator ... I guess you only need the single field matrix right?

cgcgcg commented 1 year ago

FYI, your matrix has >30,000 entries that are O(10^-18). Maybe try with Dirichlet threshold=1e-14 or so.

maxfirmbach commented 1 year ago

This would be the matrix of A(3,3), were the problems occur: ALE.zip

maxfirmbach commented 1 year ago

FYI, your matrix has >30,000 entries that are O(10^-18). Maybe try with Dirichlet threshold=1e-14 or so.

Sadly still the same behavior.

cgcgcg commented 1 year ago

And between runs the matrices are exactly identical? And you know that we should find 103 Dirichet nodes, but sometimes there are more?

maxfirmbach commented 1 year ago

I'm quite confident that the matrix stays the same, but I can check that. I look at my driven cavity and would expect 103 Dirichlet nodes for the ALE, yes.

maxfirmbach commented 1 year ago

Maybe it would make sense to read in the single field matrix and try to run MueLu on that standalone.

cgcgcg commented 1 year ago

That would certainly make it easier for us to reproduce the error.

maxfirmbach commented 1 year ago

@cgcgcg Seems like there is a difference in the number of nodes entering MueLu:

A(2,2) is a single block and has strided maps:
  range  map fixed block size = 2, strided block id = -1
  domain map fixed block size = 2, strided block id = -1
A(2,2) has 2376x2376 rows and columns.

-> 2376 rows / 2 dofs per node = 1188 nodes

but the number of nodes entering the aggregation process is = 1190 nodes.

aggregated : 103 (phase), 103/1190 [8.66%] (total)

This is weird, but it explains why in this case 2 single nodes are left over in the aggregation process. Question is why the number of nodes differ/changed from the input matrix size to the aggregation.

maxfirmbach commented 1 year ago

The lengthy output of the AmalgamationInfo shows, that the first and last entry stand alone for themselves (which are exactly the 2 left over nodes). Right now I don't understand why they even appear / are inserted.

AmalgamationInfo -- DOFs to nodes mapping:
  Mapping of row DOFs to row nodes:{0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 32, 32, 33, 33, 34, 34, 35, 35, 36, 36, 37, 37, 38, 38, 39, 39, 40, 40, 41, 41, 42, 42, 43, 43, 44, 44, 45, 45, 46, 46, 47, 47, 48, 48, 49, 49, 50, 50, 51, 51, 52, 52, 53, 53, 54, 54, 55, 55, 56, 56, 57, 57, 58, 58, 59, 59, 60, 60, 61, 61, 62, 62, 63, 63, 64, 64, 65, 65, 66, 66, 67, 67, 68, 68, 69, 69, 70, 70, 71, 71, 72, 72, 73, 73, 74, 74, 75, 75, 76, 76, 77, 77, 78, 78, 79, 79, 80, 80, 81, 81, 82, 82, 83, 83, 84, 84, 85, 85, 86, 86, 87, 87, 88, 88, 89, 89, 90, 90, 91, 91, 92, 92, 93, 93, 94, 94, 95, 95, 96, 96, 97, 97, 98, 98, 99, 99, 100, 100, 101, 101, 102, 102, 103, 103, 104, 104, 105, 105, 106, 106, 107, 107, 108, 108, 109, 109, 110, 110, 111, 111, 112, 112, 113, 113, 114, 114, 115, 115, 116, 116, 117, 117, 118, 118, 119, 119, 120, 120, 121, 121, 122, 122, 123, 123, 124, 124, 125, 125, 126, 126, 127, 127, 128, 128, 129, 129, 130, 130, 131, 131, 132, 132, 133, 133, 134, 134, 135, 135, 136, 136, 137, 137, 138, 138, 139, 139, 140, 140, 141, 141, 142, 142, 143, 143, 144, 144, 145, 145, 146, 146, 147, 147, 148, 148, 149, 149, 150, 150, 151, 151, 152, 152, 153, 153, 154, 154, 155, 155, 156, 156, 157, 157, 158, 158, 159, 159, 160, 160, 161, 161, 162, 162, 163, 163, 164, 164, 165, 165, 166, 166, 167, 167, 168, 168, 169, 169, 170, 170, 171, 171, 172, 172, 173, 173, 174, 174, 175, 175, 176, 176, 177, 177, 178, 178, 179, 179, 180, 180, 181, 181, 182, 182, 183, 183, 184, 184, 185, 185, 186, 186, 187, 187, 188, 188, 189, 189, 190, 190, 191, 191, 192, 192, 193, 193, 194, 194, 195, 195, 196, 196, 197, 197, 198, 198, 199, 199, 200, 200, 201, 201, 202, 202, 203, 203, 204, 204, 205, 205, 206, 206, 207, 207, 208, 208, 209, 209, 210, 210, 211, 211, 212, 212, 213, 213, 214, 214, 215, 215, 216, 216, 217, 217, 218, 218, 219, 219, 220, 220, 221, 221, 222, 222, 223, 223, 224, 224, 225, 225, 226, 226, 227, 227, 228, 228, 229, 229, 230, 230, 231, 231, 232, 232, 233, 233, 234, 234, 235, 235, 236, 236, 237, 237, 238, 238, 239, 239, 240, 240, 241, 241, 242, 242, 243, 243, 244, 244, 245, 245, 246, 246, 247, 247, 248, 248, 249, 249, 250, 250, 251, 251, 252, 252, 253, 253, 254, 254, 255, 255, 256, 256, 257, 257, 258, 258, 259, 259, 260, 260, 261, 261, 262, 262, 263, 263, 264, 264, 265, 265, 266, 266, 267, 267, 268, 268, 269, 269, 270, 270, 271, 271, 272, 272, 273, 273, 274, 274, 275, 275, 276, 276, 277, 277, 278, 278, 279, 279, 280, 280, 281, 281, 282, 282, 283, 283, 284, 284, 285, 285, 286, 286, 287, 287, 288, 288, 289, 289, 290, 290, 291, 291, 292, 292, 293, 293, 294, 294, 295, 295, 296, 296, 297, 297, 298, 298, 299, 299, 300, 300, 301, 301, 302, 302, 303, 303, 304, 304, 305, 305, 306, 306, 307, 307, 308, 308, 309, 309, 310, 310, 311, 311, 312, 312, 313, 313, 314, 314, 315, 315, 316, 316, 317, 317, 318, 318, 319, 319, 320, 320, 321, 321, 322, 322, 323, 323, 324, 324, 325, 325, 326, 326, 327, 327, 328, 328, 329, 329, 330, 330, 331, 331, 332, 332, 333, 333, 334, 334, 335, 335, 336, 336, 337, 337, 338, 338, 339, 339, 340, 340, 341, 341, 342, 342, 343, 343, 344, 344, 345, 345, 346, 346, 347, 347, 348, 348, 349, 349, 350, 350, 351, 351, 352, 352, 353, 353, 354, 354, 355, 355, 356, 356, 357, 357, 358, 358, 359, 359, 360, 360, 361, 361, 362, 362, 363, 363, 364, 364, 365, 365, 366, 366, 367, 367, 368, 368, 369, 369, 370, 370, 371, 371, 372, 372, 373, 373, 374, 374, 375, 375, 376, 376, 377, 377, 378, 378, 379, 379, 380, 380, 381, 381, 382, 382, 383, 383, 384, 384, 385, 385, 386, 386, 387, 387, 388, 388, 389, 389, 390, 390, 391, 391, 392, 392, 393, 393, 394, 394, 395, 395, 396, 396, 397, 397, 398, 398, 399, 399, 400, 400, 401, 401, 402, 402, 403, 403, 404, 404, 405, 405, 406, 406, 407, 407, 408, 408, 409, 409, 410, 410, 411, 411, 412, 412, 413, 413, 414, 414, 415, 415, 416, 416, 417, 417, 418, 418, 419, 419, 420, 420, 421, 421, 422, 422, 423, 423, 424, 424, 425, 425, 426, 426, 427, 427, 428, 428, 429, 429, 430, 430, 431, 431, 432, 432, 433, 433, 434, 434, 435, 435, 436, 436, 437, 437, 438, 438, 439, 439, 440, 440, 441, 441, 442, 442, 443, 443, 444, 444, 445, 445, 446, 446, 447, 447, 448, 448, 449, 449, 450, 450, 451, 451, 452, 452, 453, 453, 454, 454, 455, 455, 456, 456, 457, 457, 458, 458, 459, 459, 460, 460, 461, 461, 462, 462, 463, 463, 464, 464, 465, 465, 466, 466, 467, 467, 468, 468, 469, 469, 470, 470, 471, 471, 472, 472, 473, 473, 474, 474, 475, 475, 476, 476, 477, 477, 478, 478, 479, 479, 480, 480, 481, 481, 482, 482, 483, 483, 484, 484, 485, 485, 486, 486, 487, 487, 488, 488, 489, 489, 490, 490, 491, 491, 492, 492, 493, 493, 494, 494, 495, 495, 496, 496, 497, 497, 498, 498, 499, 499, 500, 500, 501, 501, 502, 502, 503, 503, 504, 504, 505, 505, 506, 506, 507, 507, 508, 508, 509, 509, 510, 510, 511, 511, 512, 512, 513, 513, 514, 514, 515, 515, 516, 516, 517, 517, 518, 518, 519, 519, 520, 520, 521, 521, 522, 522, 523, 523, 524, 524, 525, 525, 526, 526, 527, 527, 528, 528, 529, 529, 530, 530, 531, 531, 532, 532, 533, 533, 534, 534, 535, 535, 536, 536, 537, 537, 538, 538, 539, 539, 540, 540, 541, 541, 542, 542, 543, 543, 544, 544, 545, 545, 546, 546, 547, 547, 548, 548, 549, 549, 550, 550, 551, 551, 552, 552, 553, 553, 554, 554, 555, 555, 556, 556, 557, 557, 558, 558, 559, 559, 560, 560, 561, 561, 562, 562, 563, 563, 564, 564, 565, 565, 566, 566, 567, 567, 568, 568, 569, 569, 570, 570, 571, 571, 572, 572, 573, 573, 574, 574, 575, 575, 576, 576, 577, 577, 578, 578, 579, 579, 580, 580, 581, 581, 582, 582, 583, 583, 584, 584, 585, 585, 586, 586, 587, 587, 588, 588, 589, 589, 590, 590, 591, 591, 592, 592, 593, 593, 594, 594, 595, 595, 596, 596, 597, 597, 598, 598, 599, 599, 600, 600, 601, 601, 602, 602, 603, 603, 604, 604, 605, 605, 606, 606, 607, 607, 608, 608, 609, 609, 610, 610, 611, 611, 612, 612, 613, 613, 614, 614, 615, 615, 616, 616, 617, 617, 618, 618, 619, 619, 620, 620, 621, 621, 622, 622, 623, 623, 624, 624, 625, 625, 626, 626, 627, 627, 628, 628, 629, 629, 630, 630, 631, 631, 632, 632, 633, 633, 634, 634, 635, 635, 636, 636, 637, 637, 638, 638, 639, 639, 640, 640, 641, 641, 642, 642, 643, 643, 644, 644, 645, 645, 646, 646, 647, 647, 648, 648, 649, 649, 650, 650, 651, 651, 652, 652, 653, 653, 654, 654, 655, 655, 656, 656, 657, 657, 658, 658, 659, 659, 660, 660, 661, 661, 662, 662, 663, 663, 664, 664, 665, 665, 666, 666, 667, 667, 668, 668, 669, 669, 670, 670, 671, 671, 672, 672, 673, 673, 674, 674, 675, 675, 676, 676, 677, 677, 678, 678, 679, 679, 680, 680, 681, 681, 682, 682, 683, 683, 684, 684, 685, 685, 686, 686, 687, 687, 688, 688, 689, 689, 690, 690, 691, 691, 692, 692, 693, 693, 694, 694, 695, 695, 696, 696, 697, 697, 698, 698, 699, 699, 700, 700, 701, 701, 702, 702, 703, 703, 704, 704, 705, 705, 706, 706, 707, 707, 708, 708, 709, 709, 710, 710, 711, 711, 712, 712, 713, 713, 714, 714, 715, 715, 716, 716, 717, 717, 718, 718, 719, 719, 720, 720, 721, 721, 722, 722, 723, 723, 724, 724, 725, 725, 726, 726, 727, 727, 728, 728, 729, 729, 730, 730, 731, 731, 732, 732, 733, 733, 734, 734, 735, 735, 736, 736, 737, 737, 738, 738, 739, 739, 740, 740, 741, 741, 742, 742, 743, 743, 744, 744, 745, 745, 746, 746, 747, 747, 748, 748, 749, 749, 750, 750, 751, 751, 752, 752, 753, 753, 754, 754, 755, 755, 756, 756, 757, 757, 758, 758, 759, 759, 760, 760, 761, 761, 762, 762, 763, 763, 764, 764, 765, 765, 766, 766, 767, 767, 768, 768, 769, 769, 770, 770, 771, 771, 772, 772, 773, 773, 774, 774, 775, 775, 776, 776, 777, 777, 778, 778, 779, 779, 780, 780, 781, 781, 782, 782, 783, 783, 784, 784, 785, 785, 786, 786, 787, 787, 788, 788, 789, 789, 790, 790, 791, 791, 792, 792, 793, 793, 794, 794, 795, 795, 796, 796, 797, 797, 798, 798, 799, 799, 800, 800, 801, 801, 802, 802, 803, 803, 804, 804, 805, 805, 806, 806, 807, 807, 808, 808, 809, 809, 810, 810, 811, 811, 812, 812, 813, 813, 814, 814, 815, 815, 816, 816, 817, 817, 818, 818, 819, 819, 820, 820, 821, 821, 822, 822, 823, 823, 824, 824, 825, 825, 826, 826, 827, 827, 828, 828, 829, 829, 830, 830, 831, 831, 832, 832, 833, 833, 834, 834, 835, 835, 836, 836, 837, 837, 838, 838, 839, 839, 840, 840, 841, 841, 842, 842, 843, 843, 844, 844, 845, 845, 846, 846, 847, 847, 848, 848, 849, 849, 850, 850, 851, 851, 852, 852, 853, 853, 854, 854, 855, 855, 856, 856, 857, 857, 858, 858, 859, 859, 860, 860, 861, 861, 862, 862, 863, 863, 864, 864, 865, 865, 866, 866, 867, 867, 868, 868, 869, 869, 870, 870, 871, 871, 872, 872, 873, 873, 874, 874, 875, 875, 876, 876, 877, 877, 878, 878, 879, 879, 880, 880, 881, 881, 882, 882, 883, 883, 884, 884, 885, 885, 886, 886, 887, 887, 888, 888, 889, 889, 890, 890, 891, 891, 892, 892, 893, 893, 894, 894, 895, 895, 896, 896, 897, 897, 898, 898, 899, 899, 900, 900, 901, 901, 902, 902, 903, 903, 904, 904, 905, 905, 906, 906, 907, 907, 908, 908, 909, 909, 910, 910, 911, 911, 912, 912, 913, 913, 914, 914, 915, 915, 916, 916, 917, 917, 918, 918, 919, 919, 920, 920, 921, 921, 922, 922, 923, 923, 924, 924, 925, 925, 926, 926, 927, 927, 928, 928, 929, 929, 930, 930, 931, 931, 932, 932, 933, 933, 934, 934, 935, 935, 936, 936, 937, 937, 938, 938, 939, 939, 940, 940, 941, 941, 942, 942, 943, 943, 944, 944, 945, 945, 946, 946, 947, 947, 948, 948, 949, 949, 950, 950, 951, 951, 952, 952, 953, 953, 954, 954, 955, 955, 956, 956, 957, 957, 958, 958, 959, 959, 960, 960, 961, 961, 962, 962, 963, 963, 964, 964, 965, 965, 966, 966, 967, 967, 968, 968, 969, 969, 970, 970, 971, 971, 972, 972, 973, 973, 974, 974, 975, 975, 976, 976, 977, 977, 978, 978, 979, 979, 980, 980, 981, 981, 982, 982, 983, 983, 984, 984, 985, 985, 986, 986, 987, 987, 988, 988, 989, 989, 990, 990, 991, 991, 992, 992, 993, 993, 994, 994, 995, 995, 996, 996, 997, 997, 998, 998, 999, 999, 1000, 1000, 1001, 1001, 1002, 1002, 1003, 1003, 1004, 1004, 1005, 1005, 1006, 1006, 1007, 1007, 1008, 1008, 1009, 1009, 1010, 1010, 1011, 1011, 1012, 1012, 1013, 1013, 1014, 1014, 1015, 1015, 1016, 1016, 1017, 1017, 1018, 1018, 1019, 1019, 1020, 1020, 1021, 1021, 1022, 1022, 1023, 1023, 1024, 1024, 1025, 1025, 1026, 1026, 1027, 1027, 1028, 1028, 1029, 1029, 1030, 1030, 1031, 1031, 1032, 1032, 1033, 1033, 1034, 1034, 1035, 1035, 1036, 1036, 1037, 1037, 1038, 1038, 1039, 1039, 1040, 1040, 1041, 1041, 1042, 1042, 1043, 1043, 1044, 1044, 1045, 1045, 1046, 1046, 1047, 1047, 1048, 1048, 1049, 1049, 1050, 1050, 1051, 1051, 1052, 1052, 1053, 1053, 1054, 1054, 1055, 1055, 1056, 1057, 1058, 1058, 1059, 1059, 1060, 1060, 1061, 1061, 1062, 1062, 1063, 1063, 1064, 1064, 1065, 1065, 1066, 1066, 1067, 1067, 1068, 1068, 1069, 1069, 1070, 1070, 1071, 1071, 1072, 1072, 1073, 1073, 1074, 1074, 1075, 1075, 1076, 1076, 1077, 1077, 1078, 1078, 1079, 1079, 1080, 1080, 1081, 1081, 1082, 1082, 1083, 1083, 1084, 1084, 1085, 1085, 1086, 1086, 1087, 1087, 1088, 1088, 1089, 1089, 1090, 1090, 1091, 1091, 1092, 1092, 1093, 1093, 1094, 1094, 1095, 1095, 1096, 1096, 1097, 1097, 1098, 1098, 1099, 1099, 1100, 1100, 1101, 1101, 1102, 1102, 1103, 1103, 1104, 1104, 1105, 1105, 1106, 1106, 1107, 1107, 1108, 1108, 1109, 1109, 1110, 1110, 1111, 1111, 1112, 1112, 1113, 1113, 1114, 1114, 1115, 1115, 1116, 1116, 1117, 1117, 1118, 1118, 1119, 1119, 1120, 1120, 1121, 1121, 1122, 1122, 1123, 1123, 1124, 1124, 1125, 1125, 1126, 1126, 1127, 1127, 1128, 1128, 1129, 1129, 1130, 1130, 1131, 1131, 1132, 1132, 1133, 1133, 1134, 1134, 1135, 1135, 1136, 1136, 1137, 1137, 1138, 1138, 1139, 1139, 1140, 1140, 1141, 1141, 1142, 1142, 1143, 1143, 1144, 1144, 1145, 1145, 1146, 1146, 1147, 1147, 1148, 1148, 1149, 1149, 1150, 1150, 1151, 1151, 1152, 1152, 1153, 1153, 1154, 1154, 1155, 1155, 1156, 1156, 1157, 1157, 1158, 1158, 1159, 1159, 1160, 1160, 1161, 1161, 1162, 1162, 1163, 1163, 1164, 1164, 1165, 1165, 1166, 1166, 1167, 1167, 1168, 1168, 1169, 1169, 1170, 1170, 1171, 1171, 1172, 1172, 1173, 1173, 1174, 1174, 1175, 1175, 1176, 1176, 1177, 1177, 1178, 1178, 1179, 1179, 1180, 1180, 1181, 1181, 1182, 1182, 1183, 1183, 1184, 1184, 1185, 1185, 1186, 1186, 1187, 1187, 1188, 1188, 1189}
maxfirmbach commented 1 year ago

I suspect the error to be in the following function:

  template <class Scalar,class LocalOrdinal, class GlobalOrdinal, class Node>
  const GlobalOrdinal AmalgamationFactory<Scalar, LocalOrdinal, GlobalOrdinal, Node>::DOFGid2NodeId(GlobalOrdinal gid, LocalOrdinal blockSize,
      const GlobalOrdinal offset, const GlobalOrdinal indexBase) {
    GlobalOrdinal globalblockid = ((GlobalOrdinal) gid - offset - indexBase) / blockSize + indexBase;
    return globalblockid;
  }

Depending on the first gid and the blockSize, let's call them interesting things, can happen due to integer division.

cgcgcg commented 1 year ago

Good job tracking that down. That does explain what you're seeing. Question is now, is there something wrong in the Amalgamation factory, or is it something in the input?

maxfirmbach commented 1 year ago

I think both in a certain way (at least the type of input is relevant to trigger the error). The problem is in FSI we have a different number of Dofs per node for each field. In the case above, the ALE field starts with a GID of 3795 resulting in -> (3795-0-0)/2+0 = 1897.5 -> index = 1897, with the next one 3796 we end up with index = 1898, with the next one again with index = 1898 and so on (so the first one dangles around, as it has no second index to fill the blocksize of 2).

Depending on my blocked operator, the global starting index of my sub-field and the corresponding blocksize, I will end up with different possibilities, which can go wrong here. Even that it works for the fluid field before is just luck. We have 66 Solid Dofs -> 33 nodes (global index 0-65). The first fluid GID is 66, which is luckily divisible by Dofs per node = 3, which creates me a map that fits. If we would have a different amount of solid Dofs / nodes, things would already go wrong on the second field.

I also calculated some thermo-structure interaction examples, but the second field (thermo field) has only scalars, so a blocksize of 1, which doesn't affect things here as I always divide by 1 :smile:.

maxfirmbach commented 1 year ago

Question is how to proceed. We use a Xpetra::BlockedCrsMatrix, which uses Xpetra-style numbering of GIDs, so the start-index of the second block can be any positive integer number, which the method DOFGid2NodeId can't handle. So would it be best to somehow change the input or adapt the method, I mean in theory a Xpetra::BlockedCrsMatrix with Xpetra-stlye numbering isn't anything special to use.

maxfirmbach commented 1 year ago

@cgcgcg The NodeID to DofsID seems to work now, at least I can use an offset to get things right. The aggregation also runs through smoothly now. My next problem appears in the TentativePFactory though ... The AggStart-Pointer seems to have several entries after one other being the same value, which seems to be wrong. Might it be a problem for MueLu, when the GID's have a jump (e.g. 100, 101, 150, 151 ,152, ...) stemming from deleting rows and columns of a matrix?

cgcgcg commented 1 year ago

@maxfirmbach TBH, I'm not fully sure what is supported. Do you also have to fix this guy? https://github.com/trilinos/Trilinos/blob/d5e2e9f17ae44b449f262ae4f3c599547e33885b/packages/muelu/src/Graph/MatrixTransformation/MueLu_AmalgamationInfo_def.hpp#L134

maxfirmbach commented 1 year ago

@cgcgcg Yes, right now I have the feeling that something goes wrong there too ... at least by looking at AggStart etc. Not sure though at which place. AggStart somehow points to zero-size aggregates, which don't exist in my case. This would also explain the zero-column error in the QR of the TentativePFactory.

maxfirmbach commented 1 year ago
Teuchos::ArrayView<const GO> nodeGlobalElts = aggregates.GetMap()->getLocalElementList();

This list seems to have no jumps in GIDs, which are later checked to be in the corresponding ColMap of a subblock of A which has jumps in GIDs, so the stagnation of AggStart could happen exactly were there are no GIDs in A due to jumps ... because both maps don't overlap completely. But just a suspicion ...

maxfirmbach commented 1 year ago

@cgcgcg I adapted the maps of the input matrix (by calling the correct Xpetra::StridedMap constructor) and now things seem to work as expected. So this can be closed :+1: .