nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2
https://nextstrain.org/ncov
MIT License
1.35k stars 403 forks source link

Logistic growth is broken on example build #623

Closed jameshadfield closed 2 years ago

jameshadfield commented 3 years ago

Current Behavior Logistic grown, when run on the default example build only produces valid data for a small number of nodes, with most being assigned NaN like so:

{
  "generated_by": {
    "program": "augur",
    "version": "12.0.0"
  },
  "nodes": {
    "Australia/VIC05/2020": {
      "current_frequency": 0.0,
      "logistic_growth": NaN
    },
    "Australia/VIC1008/2020": {
      "current_frequency": 0.0,
      "logistic_growth": NaN
    },

Presumably this is due to not enough data, or a divide-by-zero thing? cc @huddlej

image

How to reproduce Steps to reproduce the current behavior:

  1. snakemake -p --profile my_profiles/example -f auspice/ncov_global.json

Your environment: if running Nextstrain locally

huddlej commented 3 years ago

This is a "not enough data" issue. The default parameters for logistic growth were tuned for larger trees and require at least 50 tips in a clade and strict min/max frequency thresholds.

We could change these parameters in the builds.yaml for the example build, but the results won't be that much more meaningful with so little data.

huddlej commented 2 years ago

Closing this since the issue is related to the unavoidably small data set and the fact that logistic growth calculations intentionally don't assign values to sparse clades in trees like the one shown above.