sxs-collaboration / spectre

SpECTRE is a code for multi-scale, multi-physics problems in astrophysics and gravitational physics.
https://spectre-code.org
Other
162 stars 191 forks source link

Feedback on new boundary conditions / corrections system #3417

Open nikwit opened 3 years ago

nikwit commented 3 years ago

The new boundary conditions/corrections system has been on develop for a few months now and I thought it would be a good idea to provide some feedback given that I worked with it for a bit. Overall, the system seems to be working well but there are few things which I found hard to deal with from a user's perspective. For a lot of these points raised here, I am simply not sure to what extent this is a design choice and to what extent we are forced to do it this way because of the architecture around it, particularly since we are mixing dynamic and static polymorphism which is often restrictive.

  1. Documentation The new boundary condition/correction system is extremely heavy on template metaprogramming, even for SpECTRE's standards. This makes it impossible to understand how to use it directly since the templating is simply duck-typed (there are no base classes or protocols) but reading the actual implementation is also extremely hard. One is therefore often simply left guessing on how it might work from the examples of the systems that are provided which leads to a lot of wasted time (and horribly unclear compiler error messages). My main point that would make working with this new system a lot easier is that it badly needs some documentation which goes beyond just providing examples in the systems. I have raised this issue before but I don't think it is given enough priority. The main problem here is that writing good and thorough documentation is time-consuming particularly if the underlying systems are still so actively changing. Therefore, it would be very helpful for everyone's time if we could come up with a way to just quickly write down the essential parts of the documentation somewhere which does not have to go through a lengthy review process. It seems like the usual "20 % of effort yields 80% of results" applies here. Perhaps this should be addressed in a separate issue.

  2. Defaulting type aliases This concerns places in the boundary condition / correction code that check if the system or any other struct has a certain type alias, usually done with the macro CREATE_HAS_TYPE_ALIAS. The main example I have in mind here is the inverse_spatial_metric_tag. If this tag is present in the system, the normal vectors are normalized w.r.t. the 3-metric, otherwise they will simply be normalized with respect to the euclidean metric. The problem here is that this is extremely hard to debug since everything appears to be working fine except that the normal vectors will be incorrectly normalized leading to wrong results. In this particular case I think the euclidean normalization should be "opt-in" so that it is normalized with respect to the euclidean metric only if the user specifies it explicitly. More generally, I would say that all of these tags should be replaced, if possible, with a static constexpr bool which specifies what should be done. Else, if this is not powerful enough because the decision is not binary e.g. where a specific tag needs to be specified, the user should then be forced to provide some null tag which then corresponds to not has_type_alias. Ideally, we would then also have a protocol for the system which checks for the existence of these type aliases.

  3. Default tag slicing When the user wants to implement a boundary condition or correction, they need to decide which quantities they need on the boundary to compute this condition/correction. Currently, this is specified by providing 5 tmpl::lists as type aliases one for each type of quantity they require. The arguments of the function specifying the boundary condition/correction is then made up of the correction to the evolved variables returned by reference, the tags specified in the tmpl:lists and some default tags which are sliced in, some of which also depend on type aliases provided by the system. These tags are then automatically sliced onto the boundary and these values can be retrieved by the function. Firstly, I very much dislike that the function signature is made up of a fairly complicated system of tmpl::lists and defaults. This leads to a large amount of trial and error trying to debug compile time errors which do not explicitly state which tags are missing or should not be there. If there was any way in which the user could simply retrieve these variables from a (or two) DataBox(es), this architecture would be much preferable in my opinion. Perhaps all tags can somehow be sliced in lazily and are then just evaluated if they are retrieved within the function (at least for GaussLobatto points for which the slicing is much more straightforward but which are used almost always). If this is not possible, I would argue for a less drastic change, namely that the way the function signature is composed should be kept as simple as possible. Currently, some variables are sliced in by default but I find this more confusing than convenient and think that the user can simply slice them in manually for each one. This is especially true for the normal_vector which is only sliced in when the inverse_spatial_metric_tag is specified in system, otherwise not.

  4. Specifying Tags in TimeDerivative The way I understood it, all "temporary" tags which the user wants to slice onto the boundary need to be not only specified in a tmpl::list mentioned above but also need to be returned by reference in TimeDerivative::apply of the system. This means that for every "temporary tag" the user needs for any boundary conditions or corrections, they need to specify this not only in the appropriate tmpl::list of the boundary condition/corrections but also needs to add it to the temporary_tags and the argument_tags of the TimeDerivative::apply function. In its function body, this is then trivially copied over, so TimeDerivative::apply starts with e.g. *result_inverse_spatial_metric = upper_spatial_metric; for every single tag that is needed to compute any boundary conditions/corrections that is not evolved. What is the motivation behind this system, is it necessary for the design? It is terribly confusing, particularly since there is only one TimeDerivative::apply function for all boundary conditions/corrections where all these tags accumulate. If we can avoid this design at all, I think we should.

  5. BoundaryCondition/Correction base classes For every system that is written, a new base class has to be written that inherits from domain::BoundaryConditions::BoundaryCondition but does not implement any extra functionality as far as I can tell except a creatable_classes type alias listing all derived classes. This type alias is then used to register all these functions to Charm. All actual boundary condition/correction implementations then derive from this intermediate base class. Is there a reason why all boundary conditions/corrections cannot simply inherit directly from domain::BoundaryConditions::BoundaryCondition? If this is just for registration it seems a lot easier to me to just have the creatable_classes type alias somewhere directly in the RegisterDerivedWithCharm file.

Again, a lot of these issues may be necessary for the design but I think they should be addressed for clarity. I think it would be extremely valuable if others who have used this new system can comment whether they also experienced some of these issues.

moxcodes commented 3 years ago

I'll agree that the relationship between several interface elements and type aliases and the resulting input types for the boundary condition and boundary correction functions is somewhat difficult to understand without digging through the code. For the GH+GRMHD system, I needed to implement a 'generic' version of both that used various type aliases to assemble the needed argument lists using tmpl/pack expansion. The result is ... also difficult to read, but does actually completely specify the interface.

Depending on the amount of time I have over the next couple of weeks, I may be able to use the interface specification I assembled there to construct some more helpful documentation of the various cases. Otherwise, I think those versions of the boundary conditions / boundary corrections are at least a useful reference for when someone else goes to write more thorough documention if I do not manage to get time to do it myself.

Regarding points 3-4, I think most of the motivation for the several type aliases and the temporaries is an optimization step to reduce memory consumption. If I recall correctly dg_..._temporary_tags from the time derivative computation actually are not stored in the DataBox and so are only temporarily allocated in a large block, then released once the boundary conditions have been computed. Some of the others enable automatic computation (e.g. dg_interior_deriv_tags actually specifies tags to take spatial derivatives of, and the derivative values do not occupy space in the DataBox). Some of the others, I think, are desired so that the boundary condition code can work directly with tags fromVariables rather than with the individual tags from the DataBox, which can lead to performance improvements (e.g. when using projection operators). I'm not necessarily saying that this can't be improved -- much of the manual list specification could probably be processed with metafunctions, but it is true that all of those individual lists are used separately in the boundary conditions code, and the separation is somewhat well-motivated. My opinion is that it's probably a higher priority and easier to thoroughly document the current interface as it is before trying to merge lists to simplify the interface.

Point 5 may be able to be improved by moving the boundary conditions / boundary corrections to the new factory-creation infrastructure that's now used for events, triggers, and several of the time-stepper related objects, which involves explicitly specifying type lists in the metavariables rather than using creatable_classes.

wthrowe commented 3 years ago

(Written at the same time as @moxcodes, so some duplication here.)

I haven't used these particular things, but I think I can explain a few of the decisions from a historical perspective.

If this tag is present in the system, the normal vectors are normalized w.r.t. the 3-metric, otherwise they will simply be normalized with respect to the euclidean metric. The problem here is that this is extremely hard to debug since everything appears to be working fine except that the normal vectors will be incorrectly normalized leading to wrong results.

In an early version of systems the method of normalization was specified explicitly in each one. I am not sure when that was changed, but my guess is it was when we tried to cut back significantly on DataBox use.

If there was any way in which the user could simply retrieve these variables from a (or two) DataBox(es), this architecture would be much preferable in my opinion. Perhaps all tags can somehow be sliced in lazily and are then just evaluated if they are retrieved within the function (at least for GaussLobatto points for which the slicing is much more straightforward but which are used almost always).

That was the original design, but it had heavy performance penalties (primarily memory, I believe).

... creatable_classes ...

I'm gradually removing creatable_classes aliases from the code. I forget if I've been avoiding these because there is some difficulty or just because there are a lot of files to edit. I don't know if there is a reason for the extra base classes beyond that, but I would guess not.

nilsdeppe commented 3 years ago

I can answer the questions about the design here.

  1. I have no objection to requiring the inverse_spatial_metric_tag always be specified and having a EuclideanInverseMetric sentinel tag to do the switching. I agree this would be clearer. I do like having the flexibility to specify the tag to use for normalization so that different systems can use different tags. This may be needed for beyond GR simulations. @wthrowe is correct about how and why this change was made. It was also necessary to support subcell. Compute tags+subcell is really painful (read effectively impossible).
  2. @moxcodes correctly explained this design. There are several different Variables where things come from. One option would be to require all the typelists be present, even if empty. That would probably remove some of the magic. The reason why some things are always projected is that I don't know of any boundary corrections that don't depend on the evolved variables (even for computing characteristic fields). The reason the normal vector isn't computed in flat space is because it's identical to the normal covector.
  3. The boundary corrections depend only on the principal part of the system and since you need to compute the principal part in order to compute the time derivative you have to have that available in the time derivative. The reason things are copied into temporary tags are for two different reasons. First, overuse of compute tags. Things like constraint damping parameters would ideally not be compute tags in order to reduce the memory footprint and so they'd work with DG-subcell. There is some argument about balance here between memory footprint and memoization, though, so it's not completely cut and dry except in the DG-subcell case where generally compute tags should be avoided. Second, systems on a curved background can't be well-balanced unless you actually evolve the background, even if the time derivatives are analytically zero. Well-balanced methods allow preserving stationary solutions exactly, even when discretized and is something that the applied math community is putting effort into. This means that the "background" should be "evolved", really.
  4. As @wthrowe pointed out, there is an updated factory class design that might make this easier. I don't really view the current design as a negative. I think it's actually nice to have one place where all the concrete implementations are listed.
nilsdeppe commented 3 years ago

Does anyone have ideas on where/how to document this? Should we have a (very) long dev guide that basically derives DG-FD hybrid schemes, discusses them analytically, and then documents how the various analytic terms translate into code places?

nilsvu commented 3 years ago

My suggestion would be a base class or a protocol that carries the code-level documentation, so you can follow the inheritance hierarchy up to find the docs. A dev guide with derivations etc would also be very useful of course, and can be linked from the code-level docs for background reading. I think discoverability from the code is very important, hence the inheritance-"breadcrumbs" that lead to the docs.

nilsdeppe commented 3 years ago

Okay, that sounds reasonable. I'm not sure how soon I'll be able to work on this, but I think that's a good plan of attack. I think what I'll do is try to get a layout/structure of the docs without content and open up a draft PR for that in order to get feedback on the layout. Then I'll add the content after we have a layout folks are happy with