mrc-ide / frogger

https://mrc-ide.github.io/frogger/
Other
0 stars 1 forks source link

danger arising from long class prefixes in code #84

Open jeffeaton opened 4 months ago

jeffeaton commented 4 months ago

I think we have discussed this before, but I can't recall details of the decision.

I find that the long class prefixes in the frogger code obscure the 'important bits' of the code (variable names and arithmetic operators) and consequently impede ability to quickly parse the code and figure out what a line of code is actually doing. It feels like this is a recipe for inviting errors if it is more difficult to figure out what code is actually doing.

E.g. this line of code in Leapfrog:

          for(int hm = 1; hm < hDS; hm++) {
            grad(hm-1, ha, g) -= cd4_prog(hm-1, ha, g) * hivstrat_adult(hm-1, ha, g, t);
            grad(hm, ha, g) += cd4_prog(hm-1, ha, g) * hivstrat_adult(hm-1, ha, g, t);
          }

has become this:

      for (int hm = 1; hm < ss.hDS; ++hm) {
        intermediate.base.grad(hm - 1, ha, g) -=
            natural_history.cd4_progression(hm - 1, ha, g) *
            state_next.base.h_hiv_adult(hm - 1, ha, g);
        intermediate.base.grad(hm, ha, g) +=
            natural_history.cd4_progression(hm - 1, ha, g) *
            state_next.base.h_hiv_adult(hm - 1, ha, g);
      }

In the first version, I can immediately see that "what is different" is hm-1 in the left side of the first line and hm in the second line -- and I can see from quick scan of the code that it's subtracting and adding the same thing. If there was a typo, I would be more likely to spot it.

In the second frogger version, I have to really carefully read it to strip out all of the filler characters intermediate.base., state_next.base., natural_history. to figure out what the operative parts of the code are.

At some point we discussed abbreviating the class names in the code at the top of the scripts, which I think could help this issue a lot.

@r-ash @mwalte10 @M-Kusumgar -- what do you think?

M-Kusumgar commented 4 months ago

I think there are a couple comments from my side:

  1. shortening the class names sounds like a good idea, it does get quite long, however if we do it globally we lose the meaning of any structure between the states and parameters which means the code gets more unreadable for someone who hasn't been developing this since the start so i would recommend doing it locally inside the function
  2. patterns like "subtracting and adding the same thing" should be an independent issue from this, this is just a case of defining a variable that represents this like foo and then adding and subtracting foo

I agree that shorter variables are better and we should use them, there's no reason to write intermediate.base everywhere, so the easiest way i can think of around this problem is:

  1. define the short variables you'll use at the start of functions (this shouldn't be too long or else probably an indication that your function is too long and can be broken up)
  2. define variables that you'll use again

so frogger code becomes

// technically these should be defined outside of the `ha` and `g` loops
// so just pretend they are outside any for loops
auto grad = intermediate.base.grad;
auto cd4_prog_prob = natural_history.cd4_progression;
auto hiv_adult = state_next.base.h_hiv_adult;

for (int hm = 1; hm < ss.hDS; ++hm) {
  real_type n_adults_progressing = cd4_prog_prob(hm - 1, ha, g) * hiv_adult(hm - 1, ha, g);
  grad(hm - 1, ha, g) -= n_adult_progressing;
  grad(hm, ha, g) += n_adult_progressing;
}
jeffeaton commented 4 months ago

Thanks @M-Kusumgar, yes this sort of declarations are exactly what I had in mind to make it more readable

// technically these should be defined outside of the `ha` and `g` loops
// so just pretend they are outside any for loops
auto grad = intermediate.base.grad;
auto cd4_prog_prob = natural_history.cd4_progression;
auto hiv_adult = state_next.base.h_hiv_adult;

for (int hm = 1; hm < ss.hDS; ++hm) {
  real_type n_adults_progressing = cd4_prog_prob(hm - 1, ha, g) * hiv_adult(hm - 1, ha, g);
  grad(hm - 1, ha, g) -= n_adult_progressing;
  grad(hm, ha, g) += n_adult_progressing;
}

Or if we wanted to keep some indicative prefixing, we could do some assignment like:

auto nh = natural_history.cd4_progression;
...

for (int hm = 1; hm < ss.hDS; ++hm) {
  real_type n_adults_progressing = nh.cd4_prog_prob(hm - 1, ha, g) * hiv_adult(hm - 1, ha, g);
  grad(hm - 1, ha, g) -= n_adult_progressing;
  grad(hm, ha, g) += n_adult_progressing;
}
M-Kusumgar commented 4 months ago

yh im onboard with creating these shorter variables inside the smaller functions and the larger for loop for time step can still move around these structured parameters

r-ash commented 4 months ago

Using prefixing at the top of functions sounds good, let's go through and add this to address this issue. Should we also add some structure to the paeds parameters, at the moment they are one flat list. @mwalte10 (see natural_history, demography etc within the base model parameters above)