ictr / mendel_penetrance

0 stars 1 forks source link

age-specific RR #2

Open XinYang6699 opened 1 year ago

XinYang6699 commented 1 year ago

Which file do you want to change the RR to be age-specific? single.f or poly.f? What kind of age groups do you want to set for age-specific RRs?

BoPeng commented 1 year ago

single.f. The last column in our data is actually a combination of age and age of onset since we have a lot of missing date for both.

XinYang6699 commented 1 year ago

Hi Bo, I'm not quite sure what you mean here. From my understanding, Chris wants to estimate age-specific RR. So what kind of age groups would you like me to estimate RR on?

BoPeng commented 1 year ago

Whatever age group that you can estimate. :-)

Our data only have a few cases with age or age of onset. In theory, we should have age for unaffected and both age and age of onset for affected individuals if they can be properly modeled. We currently combine them together since we do not know how the model works.

Now, we notice that the code has incidence rates for breast cancer from age 0 to 80 (value changes every 5 years). It would be nice if the model can estimate RR for each year as the impact of genotype, but it is perfectly ok if the model can (or our data is only informative for) estimate RR for ages 0-10, 10-20, ... , or 0-30, 30-50, 50-80, or just a single RR for all ages... Our biggest problem is that the outputted RR (which should be the combined one) does not converge and does not make sense.

XinYang6699 commented 1 year ago

I see. I think the problem is that you only have ages for cases but not for unaffected, so it will not converge.

BoPeng commented 1 year ago

I have assigned ages to all members of the pedigrees, mostly according to ages of their spouses. The RR does not converge and always goes to the max.

Given that our pedigree is quite different from the sample one (in terms of number of columns), could you confirm that the age specified in this way is being properly read and modeled?

Note that the program does some sex-specific things because it was for breast cancer, which is not the case for our data.

XinYang6699 commented 1 year ago

Hi Bo, I see. I think your pedigree file format is not correct. I have now uploaded an example of the pedigree file. You can have a look. The columns in the pedigree file are defined in the APEN subroutine in the following: agebc = var(1) agebc2 = var(2) ageoc = var(3) ageother=var(4) ageother2=var(5) agelfu = var(6) agedeath= var(7)

You may want to adjust it according to your needs.

BoPeng commented 1 year ago

So the model uses all of

  1. age of onset for breast cancer,
  2. 2nd breast cancer,
  3. ovrian cancer
  4. other cancer
  5. other cancer two
  6. lost follow up
  7. death

I can try to format the pedigree this way but we certainly miss a majority of the information.

XinYang6699 commented 1 year ago

If it is missing, just leave it empty

BoPeng commented 1 year ago

Just to confirm: "affection status" is represented by "age of xxx", so any missing age of onset would effectively make the patient unaffected. This makes it is mandatory to impute the age of onset.

XinYang6699 commented 1 year ago

Hi Bo, that is a good question. So if the woman is unaffected, the age at cancer is empty. If the woman is affected but miss age at diagnosis, please put 999 at the age of cancer column.

BoPeng commented 1 year ago

OK, but does "age" play any role in the model? Say if they are unaffected, should I use their age as "age of lost follow up", or just ignore? These ages are derived from the date of birth when the families were recruited.

XinYang6699 commented 1 year ago

I assume you performed a retrospective analysis, is it correct? Then yes, all the age information should be derived at the baseline. For the unaffected, you need the age at last follow up or death. If no age is provided, the individual will not be used in the analysis. I guess that is why your RR does not converge as only cases in your study have age information, is it right?

BoPeng commented 1 year ago

We do have some age information for the unaffected but most are missing (~80%). We also do not have any followup information for the pedigrees we have, so the ages of entering our study are likely the last follow-up date.

BoPeng commented 1 year ago

I forcefully imputed age for one of our pedigrees (which does not make sense so the parents of two 90 year old has age 110) and get a test pedigree

https://github.com/ictr/mendel_penetrance/blob/main/data/pedigree.txt

RR stayed at 0.5 all the time

https://github.com/ictr/mendel_penetrance/blob/main/data/single_out.dat

Could @XinYang6699 let me know what is wrong with the data? Note that I have tried to

  1. add genotype and bc age for two females
  2. add a few ages as age of death

but these did not help.

XinYang6699 commented 1 year ago

Hi Bo, I can't see a problem with the pedigree data. Have your tried to make a pedigree (family-tree) plot to see whether all ages/relationship are reasonable? Or have you tried some other families?

BoPeng commented 1 year ago

I have run it over all my pedigrees and RRs converge to either 0 or 3 (default upper bound). I suspect that the problem is that we do not have enough wildtype genotypes but will need to investigate more.

XinYang6699 commented 1 year ago

Hi Bo, just want to remind you that if you run multiple pedigrees together, the current coding assumes the first half of the pedigrees in your pedigree file are the complete pedigrees in your data and the second half of pedigrees in your pedigree file are the parts of families that you are ascertained on. So the log likelihood is the first half- the second half. We did that because the families in our data were selected based on some ascertainment, so when we calculated the likelihood, we conditioned on the ascertainment part. If you don't want to do that, please comment out the command on line 175 "if(ped.gt.nped/2) loglik=-loglik"