rpietro / EsophagealCancerT1N0

Evaluating outcomes after Esophageal cancer T1N0
0 stars 0 forks source link

survival curves by US region #2

Closed rpietro closed 12 years ago

rpietro commented 12 years ago

Mathias, couple questions before i start playing with this graphic:

  1. what is the variable for time?
  2. what is the variable for US region?
  3. what is the variable for 3-digit zip code?

also, i'm planning on testing the following:

http://goo.gl/Bk7JP http://goo.gl/IOBv4 -- this is likely the most promising http://goo.gl/n9DoH and http://goo.gl/ekAOr - Mathias, please check these and see if any idea might spark http://goo.gl/L7JsH http://goo.gl/nynjs

mworni commented 12 years ago

Ricardo - I think you have this dataset.

  1. variable for time: survivaltimerecodetotalofmonths (survival time) / year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3
  2. seerregistry
  3. I don't have a 3-digit zip code.
rpietro commented 12 years ago

hey man, i need three variables:

  1. event status (dead - y/n) - what is it?
  2. time until event - not sure, but i think it is survivaltimerecodetotalofmonths
  3. US region -- what is it?

also:

  1. didn't understand what you meant by year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3 - what is the role of these variables for the survival curve
  2. also didn't understand what you meant by seerregistry

takes a minute to generate the graphics, just need the variables

On Sun, Jul 15, 2012 at 11:56 AM, mworni < reply@reply.github.com

wrote:

Ricardo - I think you have this dataset.

  1. variable for time: survivaltimerecodetotalofmonths (survival time) / year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3
  2. seerregistry
  3. I don't have a 3-digit zip code.

Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-6992618

mworni commented 12 years ago

I hope this makes it clear now...

  1. event status (dead - y/n) - what is it?

There are two variables - one for cancer specific (CSS) and one for overall survival (OS)

CSS: censor_css (0=alive, death other, 1=death esoph) OS: censor_os (0=alive, 1=death esoph or other)

  1. time until event - not sure, but i think it is survivaltimerecodetotalofmonths

This is correct

  1. US region -- what is it?

This is seerregistry. Please run this code and you will get the states that are captured in SEER together with the total number of patients per state.

CrossTable(seerregistry, female, missing.include=TRUE)

There is no other code for region.

also:

  1. didn't understand what you meant by year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3 - what is the role of these variables for the survival curve

yearofdiagnosis is just the variable name for the year when the cancer was first detected - year3 is this variable grouped in three time-periods. This variable just would make it easy to show a trend in survival if you plot the survival curve by year3.

  1. also didn't understand what you meant by seerregistry

See point 3.

On Sun, Jul 15, 2012 at 11:51 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

hey man, i need three variables:

  1. event status (dead - y/n) - what is it?
  2. time until event - not sure, but i think it is survivaltimerecodetotalofmonths
  3. US region -- what is it?

also:

  1. didn't understand what you meant by year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3 - what is the role of these variables for the survival curve
  2. also didn't understand what you meant by seerregistry

takes a minute to generate the graphics, just need the variables

On Sun, Jul 15, 2012 at 11:56 AM, mworni < reply@reply.github.com

wrote:

Ricardo - I think you have this dataset.

  1. variable for time: survivaltimerecodetotalofmonths (survival time) / year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3
  2. seerregistry
  3. I don't have a 3-digit zip code.

Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-6992618


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-6995363

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center

rpietro commented 12 years ago

i added the code to your script (at the end of the recoding session) but here it is followed by several comments:

surv1 <- survfit(Surv(survivaltimerecodetotalofmonths, censor_os) ~ female, data = esoph) plot(surv1)

observations:

On Mon, Jul 16, 2012 at 4:42 AM, mworni < reply@reply.github.com

wrote:

I hope this makes it clear now...

  1. event status (dead - y/n) - what is it?

There are two variables - one for cancer specific (CSS) and one for overall survival (OS)

CSS: censor_css (0=alive, death other, 1=death esoph) OS: censor_os (0=alive, 1=death esoph or other)

  1. time until event - not sure, but i think it is survivaltimerecodetotalofmonths

This is correct

  1. US region -- what is it?

This is seerregistry. Please run this code and you will get the states that are captured in SEER together with the total number of patients per state.

CrossTable(seerregistry, female, missing.include=TRUE)

There is no other code for region.

also:

  1. didn't understand what you meant by year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3 - what is the role of these variables for the survival curve

yearofdiagnosis is just the variable name for the year when the cancer was first detected - year3 is this variable grouped in three time-periods. This variable just would make it easy to show a trend in survival if you plot the survival curve by year3.

  1. also didn't understand what you meant by seerregistry

See point 3.

On Sun, Jul 15, 2012 at 11:51 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

hey man, i need three variables:

  1. event status (dead - y/n) - what is it?
  2. time until event - not sure, but i think it is survivaltimerecodetotalofmonths
  3. US region -- what is it?

also:

  1. didn't understand what you meant by year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3 - what is the role of these variables for the survival curve
  2. also didn't understand what you meant by seerregistry

takes a minute to generate the graphics, just need the variables

On Sun, Jul 15, 2012 at 11:56 AM, mworni < reply@reply.github.com

wrote:

Ricardo - I think you have this dataset.

  1. variable for time: survivaltimerecodetotalofmonths (survival time) / year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3
  2. seerregistry
  3. I don't have a 3-digit zip code.

Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-6992618


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-6995363

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-7001048

mworni commented 12 years ago

I will soon take a look at this.

Just as a sidenote - the numbers behind the seer registry names represent the year when they started to participate in SEER.

On Mon, Jul 16, 2012 at 3:36 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

i added the code to your script (at the end of the recoding session) but here it is followed by several comments:

surv1 <- survfit(Surv(survivaltimerecodetotalofmonths, censor_os) ~ female, data = esoph) plot(surv1)

observations:

  • for the curve i used female instead of seerregistry just to make it easier. seerregistry has too many categories and so you will have to classify them in some way, maybe using the traditional region classification like we have it in NIS. also, not sure what those year numbers beside each region name meant, that was a little strange and probalby worth checking in the documentation
  • to use censor_css you will have to make a decision in terms of where you want to place the non-cancer related deaths. both censoring and putting them together would be acceptable, would report both

On Mon, Jul 16, 2012 at 4:42 AM, mworni < reply@reply.github.com

wrote:

I hope this makes it clear now...

  1. event status (dead - y/n) - what is it?

There are two variables - one for cancer specific (CSS) and one for overall survival (OS)

CSS: censor_css (0=alive, death other, 1=death esoph) OS: censor_os (0=alive, 1=death esoph or other)

  1. time until event - not sure, but i think it is survivaltimerecodetotalofmonths

This is correct

  1. US region -- what is it?

This is seerregistry. Please run this code and you will get the states that are captured in SEER together with the total number of patients per state.

CrossTable(seerregistry, female, missing.include=TRUE)

There is no other code for region.

also:

  1. didn't understand what you meant by year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3 - what is the role of these variables for the survival curve

yearofdiagnosis is just the variable name for the year when the cancer was first detected - year3 is this variable grouped in three time-periods. This variable just would make it easy to show a trend in survival if you plot the survival curve by year3.

  1. also didn't understand what you meant by seerregistry

See point 3.

On Sun, Jul 15, 2012 at 11:51 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

hey man, i need three variables:

  1. event status (dead - y/n) - what is it?
  2. time until event - not sure, but i think it is survivaltimerecodetotalofmonths
  3. US region -- what is it?

also:

  1. didn't understand what you meant by year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3 - what is the role of these variables for the survival curve
  2. also didn't understand what you meant by seerregistry

takes a minute to generate the graphics, just need the variables

On Sun, Jul 15, 2012 at 11:56 AM, mworni < reply@reply.github.com

wrote:

Ricardo - I think you have this dataset.

  1. variable for time: survivaltimerecodetotalofmonths (survival time) / year of diagnosis: yearofdiagnosis / three groups of year of diagnosis: year3
  2. seerregistry
  3. I don't have a 3-digit zip code.

Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-6992618


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-6995363

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-7001048


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/EsophagealCancerT1N0/issues/2#issuecomment-7006058

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center