trvrb / flux

Integrating influenza antigenic dynamics with molecular evolution
http://bedford.io/papers/bedford-flux/
12 stars 5 forks source link

Incidence correlation #1

Closed trvrb closed 10 years ago

trvrb commented 11 years ago
  • I do not concur that Figure 5 indicates "antigenic drift drives incidence rates." There are problems with making the correlations in Figure 5 across all four clades, since the clades have different average rates of drift and incidence. The conclusion would be supported if there was correlation between incidence and rate within clades, or if the correlation across clades exceeded the correlation of the clade averages. But as it stands, Figure 5's correlations are just due to the fact that the four clades differ in the average rates, which does not imply that drift drives incidence. This paper is worth publishing even if a proper analysis fails to find that drift drives incidence, because currently the relationship is widely assumed without quantitative analysis. However, it is important to get this right as it forms the basis for influenza vaccine selection. I describe my concerns in more detail in the specific comments.
  • I have concerns about Figure 5, which the authors use to support their claim that "antigenic drift drives incidence rates." The authors first establish the well-known fact that H3N2 incidence is highest, followed by H1N1 and then B. They also find that relative incidence is correlated with the rate of antigenic drift. However, this does not imply causality in terms of the higher average drift driving the higher average incidence. The authors do not claim any causality in this correlation between average drift and incidence in the four clades, and I am therefore fine with this part of the analysis. The problems arise in Figure 5A. The authors find a significant correlation between drift and incidence taken over all years and clades, and then use this to argue that drift drives incidence. I would agree with this conclusion if higher drift was associated with higher incidence with a correlation that is better than that obtained simply by comparing the averages of the four clades. But I don't think this is true. In Figure 5A, the authors are averaging over all clades, making the correlation in 5A a trivial consequence of the fact that H3N2 has higher average incidence and drift. Even if the rate of drift and incidence were constant year-over-year for each clade (and simply differed among clades), the authors would observe the correlation that they report. Put another way, imagine that every year the drift and incidence in each clade was equal to its average. Then Figure 5A would in effect correspond to simply recalculating the correlation between average drift and incidence in a scenario where each point is included as many times as there are years being examined. The more years that were included the higher the P-value would get because the number of data points would increase, but really the single actual trend (that H3N2 is higher in average drift and incidence) is just being counted multiple times. In order to draw a conclusion stronger that drift drives incidence, the authors need to show that incidence and drift are correlated within each clade individually, or else standardize the incidence so that the mean and variance for each individual clade is zero and one. It is not clear to me that there is any correlation between the drift and incidence within any of the individual clades. A similar problem affects Figure 5B - for example, since H3N2 has the highest drift, it will always tend to be associated with lower drift in other clades since the other clades inherently have lower average drift. But this does not imply direct interference between clades. I am open to further discussion about this point, but I do not see how Figure 5 implies that antigenic drift drives incidence, or that there is dynamical interference across clades. By the way, I do not think that a lack of correlation would be a disqualifying concern in terms of publication. Finding no evidence that antigenic drift drives incidence is just as important as finding that it does. But the claim needs to be correct.
trvrb commented 10 years ago

This is a very astute criticism. We see how differing overall incidence between clades could have given an artifactual signal in our previous year-to-year drift vs incidence comparisons. In this revision, we have reanalyzed the data in the way suggested; we show that year-to-year antigenic drift and incidence are correlated within each clade individually, arriving at correlation coefficients of 0.51, 0.29, 0.44 and 0.14 for A/H3N2, A/H1N1, B/Vic and B/Yam respectively. None of these correlations are significant on their own, however observing four correlation coefficients of this magnitude is highly unlikely under a null model derived from bootstrap permutations (p = 0.018). Because the increase in incidence tends to follow periods of pronounced antigenic drift, we conclude that there appears to a be causal relationship between antigenic change and increased incidence.

However, in redoing this analysis on a lineage-by-lineage basis, we lost much of the signal for interference between lineages. We think there may still be something there, but the nuanced analysis that this issue deserves seems beyond the scope of the paper. We have decided to instead drop the discussion of interference between lineages.