Automate testing of thrive lines

eatyourpeas commented 4 years ago

It would be useful to automate a process which creates serial data points that mimic the growth of a child for testing purposes

statist7 commented 4 years ago

Here is some R code to do that, using the weight correlation matrix I sent you a few days ago. I don't know how to interpret R in Python, but I'm sure it's easy to do!

The code produces a matrix with 10 rows (each corresponding to one subject) and 13 columns, which are ages 0 to 12 months. The weight z-scores need back-transforming to weight, using whichever sex you choose.

If you want random rather than fixed ages that will involve interpolating between the whole-month correlations, which is more involved.

  library(MASS)
  R <- as.matrix(read.csv('~/Dropbox/CIGS/RCPCH weight correlation matrix by month.csv'))[, -1]
  diag(R) <- 1
  nvars <- dim(R)[[1]]
  nsubs <- 10 # no of subjects
  X <- mvrnorm(nsubs, rep(0, nvars), R)
  # each row is one random subject's weight z-scores from 0 to 12 months

^{Created on 2020-06-13 by the reprex package (v0.3.0)}

eatyourpeas commented 4 years ago

That is hilariously concise @statist7 you are a remarkable guy. I have created a function here called: def create_fictional_child(sex: str, measurement_type: str, requested_sds: float, number_of_measurements: int, starting_decimal_age: float, measurement_interval_value: float, measurement_interval_type: str, gestation_weeks = 0, gestation_days = 0, drift: bool = False, drift_sds_range: float = 0.0): that allows the user to prescribe number of data points, interval between and starting age. I have also put in an option to introduce a degree of drift, largely to test the weight correlation function I put in using your matrices. So far I am only using the months but using the weeks should only be a question of swapping over. I am very grateful for this snippet - I will see if I can incorporate your more elegant solution. As for weight correlations I did struggle with the bilinear interpolation but it does not generate a number. Not sure how accurate? Perhaps if you have time to look at mine you could give me pointers on what to change? See the correlate_weight function at line 114 onwards.

statist7 commented 4 years ago

I slightly struggled with your code, so instead I've expanded the R code to handle a set of random ages, or the ages can alternatively be provided directly. The required correlation matrix is linearly interpolated using the akima package, so the code needs to be run in R rather than translated to Python. Note we can easily switch to spline interpolation, though it should make very little difference.

  library(MASS)
  library(akima)

  t <- 0:12 # ages in correlation matrix (months)
  nt <- 5 # number of measurements to simulate
  nsubs <- 1 # one subject
  t0 <- sort(runif(nt, min(t), max(t))) # generate ordered random ages
  R <- as.matrix(read.csv('~/Dropbox/CIGS/RCPCH weight correlation matrix by month.csv'))[, -1]

  xyz <- cbind(expand.grid(x = t, y = t), z = c(R)) # grid of ages and correlation matrix
  corr <- with(xyz, interp(x, y, z, t0, t0))$z # interpolate correlations for measurement ages
  diag(corr) <- 1
  X <- mvrnorm(nsubs, rep(0, nt), corr) # simulated growth curve as z-scores at ages t0

  # to check the code, set ages to reference ages with 1000 subjects and
  # check that simulated correlation matrix matches reference correlation matrix

  t0 <- t
  nt <- length(t0)
  nsubs <- 1000
  corr <- with(xyz, interp(x, y, z, t0, t0))$z
  diag(corr) <- 1
  X <- mvrnorm(nsubs, rep(0, nt), corr)
  round(corr - cor(X), 2) # correlation errors only ~0.01
#>       [,1]  [,2]  [,3]  [,4]  [,5]  [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#>  [1,] 0.00  0.00  0.01  0.00  0.00  0.02 0.02 0.04 0.04  0.04  0.04  0.05  0.04
#>  [2,] 0.00  0.00 -0.01 -0.01 -0.01 -0.01 0.00 0.01 0.02  0.02  0.02  0.03  0.02
#>  [3,] 0.01 -0.01  0.00  0.00  0.00  0.00 0.00 0.02 0.02  0.01  0.02  0.03  0.02
#>  [4,] 0.00 -0.01  0.00  0.00  0.00  0.00 0.00 0.01 0.01  0.00  0.01  0.02  0.01
#>  [5,] 0.00 -0.01  0.00  0.00  0.00  0.00 0.00 0.01 0.00  0.00  0.00  0.01  0.00
#>  [6,] 0.02 -0.01  0.00  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.00
#>  [7,] 0.02  0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.00
#>  [8,] 0.04  0.01  0.02  0.01  0.01  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.01
#>  [9,] 0.04  0.02  0.02  0.01  0.00  0.00 0.00 0.00 0.00  0.00  0.01  0.01  0.01
#> [10,] 0.04  0.02  0.01  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.00
#> [11,] 0.04  0.02  0.02  0.01  0.00  0.00 0.00 0.00 0.01  0.00  0.00  0.01  0.00
#> [12,] 0.05  0.03  0.03  0.02  0.01  0.01 0.01 0.01 0.01  0.01  0.01  0.00  0.01
#> [13,] 0.04  0.02  0.02  0.01  0.00  0.00 0.00 0.01 0.01  0.00  0.00  0.01  0.00

^{Created on 2020-06-15 by the reprex package (v0.3.0)}

eatyourpeas commented 4 years ago

This is very elegant. There is an akima module with in the SciPy package. I will try and convert. Might need to reach out to you separately to make it work.

statist7 commented 4 years ago

That's good news.

I've found this SO query about mvrnorm that includes a Python in R solution. https://stackoverflow.com/questions/31666270/python-analog-of-mvrnorms-empirical-setting-when-generating-a-multivariate-dist

But ignore the empirical = True.

eatyourpeas commented 3 years ago

Not sure how this got logged as completed. The title is also misleading. This is fundamentally about testing thrive lines, so I have changed the title. It is still on the wish list to implement once MVP is complete. I am leaving open but moving to roadmap-future for the moment.

dc2007git commented 6 months ago

@eatyourpeas was this actioned?

rcpch / digital-growth-charts-server

Automate testing of thrive lines #27