nutriverse / zscorer

Anthropometric z-score calculator
https://nutriverse.io/zscorer
GNU Affero General Public License v3.0
14 stars 12 forks source link

Bfaz values do not have a mean of 0 or an SD of 1 #123

Open Amandax429 opened 4 days ago

Amandax429 commented 4 days ago

Dear Mark and Ernest,

First of all, thank you so much for the great package! I've spent some time trying to figure out how to correct BMI values for a young sample without grouping them into weight categories, so I was very pleased to find your package.

I have a question about the output of the addWGSR function: I used this function (see below) to calculate bfaz values using height in cm, weight in kg and age in days as input. However, when I checked the mean and standard deviation of the output, the values were not as expected (e.g. mean: -0.005, sd = 1.22). How is this possible? Have I overlooked an important step or piece of information?

bfaz <- addWGSR(data = data, sex = "gender", firstPart = "weight.kg", secondPart = "height.cm", index = "bfa")

Looking forward to your answer.

Best wishes, Amanda

ernestguevarra commented 4 days ago

Dear Mark and Ernest,

First of all, thank you so much for the great package! I've spent some time trying to figure out how to correct BMI values for a young sample without grouping them into weight categories, so I was very pleased to find your package.

I have a question about the output of the addWGSR function: I used this function (see below) to calculate bfaz values using height in cm, weight in kg and age in days as input. However, when I checked the mean and standard deviation of the output, the values were not as expected (e.g. mean: -0.005, sd = 1.22). How is this possible? Have I overlooked an important step or piece of information?

bfaz <- addWGSR(data = data, sex = "gender", firstPart = "weight.kg", secondPart = "height.cm", index = "bfa")

Looking forward to your answer.

Best wishes, Amanda

Amanda, thanks for your message. So, it seems you are calculating z-score for BMI-for-age and you are saying that you expect/expected the mean to be 0 and SD to be 1.

And then you shared your syntax above as follows:

bfaz <- addWGSR(data = data,
  sex = "gender",
  firstPart = "weight.kg",
  secondPart = "height.cm",
  index = "bfa"
)

You did mention that you had age converted to days but I don't see in your syntax above how you inputted the age? I am guessing you might have just missed it in what you wrote above? So, that would be thirdPart = "age value" is that correct?

It is a bit hard for me to comment on your question without being able to replicate the issue that you are raising. That usually means getting a reproducible example. So, I will just raise more questions so you can provide more detail:

  1. What are the age ranges (in months) of your sample? The calculation for BMI-for-age z-score in {zscorer} is for up to 228 months (19 years). Beyond this, the reference tables will not have the information to calculate the z-score.
  2. From my understanding, what you are referring to mean = 0 and SD = 1 is the characteristics of a standard normal distribution or z-distribution (which is the type of distribution from which z-scores are estimated from). But this doesn't mean that from an actual sample of weights and height and age that the resulting z-scores from each of the samples will constitute a z-distribution (mean = 0 and SD = 1). What this means is that the sample you are calculating the z-score of have weight and height values that are most likely not normally distributed. And this can be due to several reasons e.g., measurement issues, sample size issues, etc. But with how you are asking your question, it is not clear to me the type of sample values on which you are calculating BMI-for-age z-score for. Can you explain?

I look forward to hearing back from you.

Amandax429 commented 4 days ago

Hi Ernest,

Thank you for your fast reply. You're right, I forgot the thirdPart. The code I used was:

bfaz <- addWGSR(data = data,
sex = "gender",
firstPart = "weight.kg",
secondPart = "height.cm",
thirdPart = "age.d"
index = "bfa")

My sample is between 7 and 17 years old (i.e. the age in days is between 2555 and 6205 days). Since the sample size is relatively small and most of the subjects are between 7 and 10 years old, neither age nor weight are normally distributed. However, this should not affect the mean or standard deviation when I calculate z-scores, right? If I calculate z-scores for a variable, such as age, the mean and standard deviation of the transformed/standardized age variable will always be 0 and 1, respectively.

Maybe I just don't understand what exactly the output of the addWGSR function is. I thought that if I use the addWGSR function as above with the raw, untransformed age, height, and weight values for my sample, the resulting bfaz values are standardized z-scores and thus have a mean of 0 and a standard deviation of 1. Is this wrong? And do I need to standardize the output bfaz values again if I need bfaz as a standardized variable?

Looking forward to your answer,

Best wishes, Amanda

ernestguevarra commented 4 days ago

@Amandax429, the output of addWGSR() is the z-score for each child based on a reference population which in this case is the WHO Child Growth Standards - see World Health Organization. (2006). WHO child growth standards : length/height-for-age, weight-for-age, weight-for-length, weight -for-height and body mass index-for-age : methods and development. World Health Organization. https://apps.who.int/iris/handle/10665/43413 for more info.

The z-score is based on the deviation of the BMI from the reference population mean. This is how z-score for nutritional indices such as BMI-for-age z-score is determined. hence, when getting the mean and SD of the z-scores for BMI-for-age of a sample, this value indicates how much the sample's distribution differs from the reference population.

This is different from a more general approach to z-score specification in which the z-score is based on the deviation of each value in the sample from the sample mean. In this situation, mean of the z-scores is always 0 and standard deviation is always 1. However, if a known population mean and standard deviation is used to determine z-score, mean and standard deviation of the z-scores of the sample will only approximate 0 and 1 respectively if the sample is random.