radivot / SEERaBomb

This R package contains codes that setup SEER and A-bomb survivor data use with R.
GNU General Public License v2.0
14 stars 2 forks source link

Cancer type extracted from ICD9 doesn't match the Primary Site #6

Closed zhangyz1997 closed 4 years ago

zhangyz1997 commented 4 years ago

I'm going to generate a list of breast cancer cases with the following code canc.breast<-subset(canc, cancer = 'breast') However, when I looked into the list, I found that some cases, which were coded 1749 and 175 in ICD9 and were categorized into 'breast' in 'cancer' (which relies on ICD9 according to the description of mapCancs), have varies ICD-O-3 codes other than C50x, and most of them point to glands other than breast (such as pancreas, parotid glands, etc). Therefore, is it reliable to use the 'cancer' variable for case listing, or we should use the 'primsite' variable just following SEER's suggestion? How about in the second cancer analysis? Can I review the cancer type and change it directly to perform the second cancer analysis?


Her is a list of some abnormal cases.

radivot commented 4 years ago

the cancer field is only an initial definition. It is expected that you will change things, as in this line in the code chunk above example 1

canc$cancer=fct_collapse(canc$cancer,AML=c("AML","AMLti","APL"))

Regarding second cancers your definitions may not work. There is a fair amount of starting year mapping that happens under the hood, so I can't guarantee anything. In general, if your cancer is present in SEER across all years, there is a good chance your newly defined cancer will work fine.

On Wed, Apr 15, 2020 at 7:07 AM zhangyz1997 notifications@github.com wrote:

I'm going to generate a list of breast cancer cases with the following code canc.breast<-subset(canc, cancer = 'breast') However, when I looked into the list, I found that some cases, which were coded 1749 and 175 in ICD9 and were categorized into 'breast' in 'cancer' (which relies on ICD9 according to the description of mapCancs), have varies ICD-O-3 codes other than C50x, and most of them point to glands other than breast (such as pancreas, parotid glands, etc). Therefore, is it reliable to use the 'cancer' variable for case listing, or we should use the 'primsite' variable just following SEER's suggestion https://seer.cancer.gov/seerstat/tutorials/howto/select.html? How about in the second cancer analysis? Can I review the cancer type and change it directly to perform the second cancer analysis?

Her is a list https://github.com/radivot/SEERaBomb/files/4480793/abnormal-list.xlsx of some abnormal cases.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/radivot/SEERaBomb/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KTZSI76EJTQLKITOQFLRMWIPFANCNFSM4MIPN54A .

zhangyz1997 commented 4 years ago

the cancer field is only an initial definition. It is expected that you will change things, as in this line in the code chunk above example 1 canc$cancer=fct_collapse(canc$cancer,AML=c("AML","AMLti","APL")) Regarding second cancers your definitions may not work. There is a fair amount of starting year mapping that happens under the hood, so I can't guarantee anything. In general, if your cancer is present in SEER across all years, there is a good chance your newly defined cancer will work fine. On Wed, Apr 15, 2020 at 7:07 AM zhangyz1997 @.***> wrote: I'm going to generate a list of breast cancer cases with the following code canc.breast<-subset(canc, cancer = 'breast') However, when I looked into the list, I found that some cases, which were coded 1749 and 175 in ICD9 and were categorized into 'breast' in 'cancer' (which relies on ICD9 according to the description of mapCancs), have varies ICD-O-3 codes other than C50x, and most of them point to glands other than breast (such as pancreas, parotid glands, etc). Therefore, is it reliable to use the 'cancer' variable for case listing, or we should use the 'primsite' variable just following SEER's suggestion https://seer.cancer.gov/seerstat/tutorials/howto/select.html? How about in the second cancer analysis? Can I review the cancer type and change it directly to perform the second cancer analysis? ------------------------------ Her is a list https://github.com/radivot/SEERaBomb/files/4480793/abnormal-list.xlsx of some abnormal cases. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KTZSI76EJTQLKITOQFLRMWIPFANCNFSM4MIPN54A .

How about case listing? Is it acceptable to list cases by 'cancer'?

radivot commented 4 years ago

sorry, I don't understand your question.

On Wed, Apr 15, 2020 at 9:03 AM zhangyz1997 notifications@github.com wrote:

the cancer field is only an initial definition. It is expected that you will change things, as in this line in the code chunk above example 1 canc$cancer=fct_collapse(canc$cancer,AML=c("AML","AMLti","APL")) Regarding second cancers your definitions may not work. There is a fair amount of starting year mapping that happens under the hood, so I can't guarantee anything. In general, if your cancer is present in SEER across all years, there is a good chance your newly defined cancer will work fine. … <#m6299003836736033430> On Wed, Apr 15, 2020 at 7:07 AM zhangyz1997 @.***> wrote: I'm going to generate a list of breast cancer cases with the following code canc.breast<-subset(canc, cancer = 'breast') However, when I looked into the list, I found that some cases, which were coded 1749 and 175 in ICD9 and were categorized into 'breast' in 'cancer' (which relies on ICD9 according to the description of mapCancs), have varies ICD-O-3 codes other than C50x, and most of them point to glands other than breast (such as pancreas, parotid glands, etc). Therefore, is it reliable to use the 'cancer' variable for case listing, or we should use the 'primsite' variable just following SEER's suggestion https://seer.cancer.gov/seerstat/tutorials/howto/select.html? How about in the second cancer analysis? Can I review the cancer type and change it directly to perform the second cancer analysis? ------------------------------ Her is a list https://github.com/radivot/SEERaBomb/files/4480793/abnormal-list.xlsx of some abnormal cases. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6 https://github.com/radivot/SEERaBomb/issues/6>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KTZSI76EJTQLKITOQFLRMWIPFANCNFSM4MIPN54A .

How about case listing? Is it acceptable to list cases by 'cancer'?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radivot/SEERaBomb/issues/6#issuecomment-614026160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KT3XWVJ6BKJJOSRYWBLRMWWAHANCNFSM4MIPN54A .

zhangyz1997 commented 4 years ago

sorry, I don't understand your question. On Wed, Apr 15, 2020 at 9:03 AM zhangyz1997 notifications@github.com wrote: the cancer field is only an initial definition. It is expected that you will change things, as in this line in the code chunk above example 1 canc$cancer=fct_collapse(canc$cancer,AML=c("AML","AMLti","APL")) Regarding second cancers your definitions may not work. There is a fair amount of starting year mapping that happens under the hood, so I can't guarantee anything. In general, if your cancer is present in SEER across all years, there is a good chance your newly defined cancer will work fine. … <#m6299003836736033430> On Wed, Apr 15, 2020 at 7:07 AM zhangyz1997 @.***> wrote: I'm going to generate a list of breast cancer cases with the following code canc.breast<-subset(canc, cancer = 'breast') However, when I looked into the list, I found that some cases, which were coded 1749 and 175 in ICD9 and were categorized into 'breast' in 'cancer' (which relies on ICD9 according to the description of mapCancs), have varies ICD-O-3 codes other than C50x, and most of them point to glands other than breast (such as pancreas, parotid glands, etc). Therefore, is it reliable to use the 'cancer' variable for case listing, or we should use the 'primsite' variable just following SEER's suggestion https://seer.cancer.gov/seerstat/tutorials/howto/select.html? How about in the second cancer analysis? Can I review the cancer type and change it directly to perform the second cancer analysis? ------------------------------ Her is a list https://github.com/radivot/SEERaBomb/files/4480793/abnormal-list.xlsx of some abnormal cases. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6 <#6>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KTZSI76EJTQLKITOQFLRMWIPFANCNFSM4MIPN54A . How about case listing? Is it acceptable to list cases by 'cancer'? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KT3XWVJ6BKJJOSRYWBLRMWWAHANCNFSM4MIPN54A .

Is it safe to use the code below to get all breast cancer cases? canc.breast<-subset(canc, cancer = 'breast') I doubt it because I find some conflicts between the 'ICD9' (and 'cancer') and 'primsite'. I'm not sure which variable I should trust in such cases. (Maybe it's a problem that I should ask SEER instead. To be more specific: maybe ICD-9 1749 & 175_ are not breast cancer?)

radivot commented 4 years ago

sorry i just don't know

On Wed, Apr 15, 2020 at 9:37 AM zhangyz1997 notifications@github.com wrote:

sorry, I don't understand your question. On Wed, Apr 15, 2020 at 9:03 AM zhangyz1997 notifications@github.com wrote: … <#m6763423630269841675> the cancer field is only an initial definition. It is expected that you will change things, as in this line in the code chunk above example 1 canc$cancer=fct_collapse(canc$cancer,AML=c("AML","AMLti","APL")) Regarding second cancers your definitions may not work. There is a fair amount of starting year mapping that happens under the hood, so I can't guarantee anything. In general, if your cancer is present in SEER across all years, there is a good chance your newly defined cancer will work fine. … <#m6299003836736033430> On Wed, Apr 15, 2020 at 7:07 AM zhangyz1997 @.***> wrote: I'm going to generate a list of breast cancer cases with the following code canc.breast<-subset(canc, cancer = 'breast') However, when I looked into the list, I found that some cases, which were coded 1749 and 175 in ICD9 and were categorized into 'breast' in 'cancer' (which relies on ICD9 according to the description of mapCancs), have varies ICD-O-3 codes other than C50x, and most of them point to glands other than breast (such as pancreas, parotid glands, etc). Therefore, is it reliable to use the 'cancer' variable for case listing, or we should use the 'primsite' variable just following SEER's suggestion https://seer.cancer.gov/seerstat/tutorials/howto/select.html? How about in the second cancer analysis? Can I review the cancer type and change it directly to perform the second cancer analysis? ------------------------------ Her is a list https://github.com/radivot/SEERaBomb/files/4480793/abnormal-list.xlsx of some abnormal cases. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6 https://github.com/radivot/SEERaBomb/issues/6 <#6 https://github.com/radivot/SEERaBomb/issues/6>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KTZSI76EJTQLKITOQFLRMWIPFANCNFSM4MIPN54A . How about case listing? Is it acceptable to list cases by 'cancer'? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6 (comment) https://github.com/radivot/SEERaBomb/issues/6#issuecomment-614026160>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KT3XWVJ6BKJJOSRYWBLRMWWAHANCNFSM4MIPN54A .

Is it safe to use the code below to get all breast cancer cases? canc.breast<-subset(canc, cancer = 'breast') I doubt it because I find some conflicts between the 'ICD9' and 'primsite'. I'm not sure which variable I should trust in such cases.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radivot/SEERaBomb/issues/6#issuecomment-614045517, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KT4IE46NVQW4IF2PP43RMW2CBANCNFSM4MIPN54A .

zhangyz1997 commented 4 years ago

I have received a reply from SEER official support team concerning this problem

SEER does not collect cases using ICD codes – we currently code cases using ICD-O-3. So, the variable “Recode ICD-O-2 to 9” requires the use of several coding conversions -- first convert to ICD-O-2 then to ICD9. The use of these backwards conversions will necessarily lead to some loss of accuracy/detail. We have actually removed this data item from the November 2019 data due to confusion over this. In general, we would recommend using the variable “Site recode ICD-O-3/WHO 2008” to select cases, unless you need a more granular selection, in which case you can work with ICD-O-3 histology type, behavior and primary site directly

Therefore, maybe it's a better idea to map cancers according to 'primsite' instead of 'ICD9' in the future?

radivot commented 4 years ago

this is the mapping SEERaBomb currently uses. My main interest is in hemes, so I'm mostly thinking about ICD-O3 (i.e. histo3 below)

mapCancs function (D) { ICD9 = D$ICD9 histo3 = D$histo3 cancer = rep("other", dim(D)[1]) cancer[(ICD9 == 9999)] = "unknown" cancer[(ICD9 >= 2300) & (ICD9 < 2310)] = "giCIS" cancer[(ICD9 >= 2310) & (ICD9 < 2320)] = "respCIS" cancer[(ICD9 >= 2320) & (ICD9 < 2330)] = "skinCIS" cancer[(ICD9 == 2330)] = "breastCIS" cancer[(ICD9 == 2331)] = "cervixCIS" cancer[(ICD9 >= 2332) & (ICD9 <= 2333)] = "femGenCIS" cancer[(ICD9 >= 2334) & (ICD9 <= 2336)] = "maleGenCIS" cancer[(ICD9 >= 2337) & (ICD9 < 2340)] = "guCIS" cancer[(ICD9 >= 2340) & (ICD9 < 2349)] = "otherCIS" cancer[(ICD9 >= 1400) & (ICD9 < 1500)] = "oral" cancer[(ICD9 >= 1500) & (ICD9 <= 1509)] = "esophagus" cancer[(ICD9 >= 1510) & (ICD9 <= 1519)] = "stomach" cancer[(ICD9 >= 1520) & (ICD9 <= 1529)] = "intestine" cancer[(ICD9 >= 1530) & (ICD9 <= 1539)] = "colon" cancer[(ICD9 >= 1540) & (ICD9 <= 1549)] = "rectal" cancer[(ICD9 >= 1550) & (ICD9 <= 1559)] = "liver" cancer[(ICD9 >= 1560) & (ICD9 <= 1569)] = "gallBladder" cancer[(ICD9 >= 1570) & (ICD9 <= 1579)] = "pancreas" cancer[(ICD9 >= 1580) & (ICD9 <= 1589)] = "peritonium" cancer[(ICD9 >= 1590) & (ICD9 <= 1599)] = "GI" cancer[(ICD9 >= 1600) & (ICD9 <= 1609)] = "sinus" cancer[(ICD9 >= 1610) & (ICD9 <= 1619)] = "larynx" cancer[(ICD9 >= 1620) & (ICD9 <= 1629)] = "lung" cancer[(ICD9 == 1639)] = "pleura" cancer[(ICD9 >= 1640) & (ICD9 <= 1649)] = "thymus" cancer[(ICD9 >= 1700) & (ICD9 <= 1709)] = "bone" cancer[(ICD9 >= 1710) & (ICD9 <= 1719)] = "HnN" cancer[(ICD9 >= 1720) & (ICD9 <= 1729)] = "melanoma" cancer[(ICD9 >= 1730) & (ICD9 <= 1739)] = "skin" cancer[(ICD9 >= 1740) & (ICD9 <= 1749)] = "breast" cancer[(ICD9 == 175) | ((ICD9 >= 1750) & (ICD9 <= 1759))] = "breast" cancer[ICD9 == 179] = "uterus" cancer[(ICD9 >= 1800) & (ICD9 <= 1809)] = "cervix" cancer[(ICD9 >= 1820) & (ICD9 <= 1829)] = "uterus" cancer[(ICD9 >= 1830) & (ICD9 <= 1839)] = "ovary" cancer[(ICD9 == 2362)] = "ovary" cancer[(ICD9 >= 1840) & (ICD9 <= 1849)] = "femGen" cancer[ICD9 == 185] = "prostate" cancer[(ICD9 >= 1860) & (ICD9 <= 1869)] = "testes" cancer[(ICD9 >= 1870) & (ICD9 <= 1879)] = "maleGen" cancer[(ICD9 >= 1880) & (ICD9 <= 1889)] = "bladder" cancer[(ICD9 >= 1890) & (ICD9 <= 1899)] = "renal" cancer[(ICD9 >= 1900) & (ICD9 <= 1909)] = "eye" cancer[(ICD9 >= 1910) & (ICD9 <= 1919)] = "brain" cancer[(ICD9 >= 1920) & (ICD9 <= 1929)] = "nerves" cancer[ICD9 == 193] = "thyroid" cancer[ICD9 == 1991] = "otherMalig" cancer[(histo3 >= 9590) & (histo3 < 9600)] = "NHL" cancer[(histo3 >= 9650) & (histo3 < 9670)] = "HL" cancer[(histo3 >= 9670) & (histo3 < 9730)] = "NHL" cancer[(histo3 >= 9730) & (histo3 < 9735)] = "MM" cancer[(histo3 >= 9735) & (histo3 < 9740)] = "NHL" cancer[(histo3 >= 9740) & (histo3 <= 9742)] = "MPN" cancer[(histo3 >= 9743) & (histo3 < 9760)] = "MPN" cancer[(histo3 >= 9760) & (histo3 <= 9770)] = "MM" cancer[(histo3 >= 9800) & (histo3 < 9810)] = "OL" cancer[histo3 == 9948] = "OL" cancer[(histo3 >= 9810) & (histo3 < 9840)] = "ALL" cancer[(histo3 == 9831) & (D$yrdx > 2009)] = "LGL" cancer[(histo3 == 9823)] = "CLL" cancer[(histo3 == 9670)] = "CLL" cancer[(histo3 >= 9840) & (histo3 < 9940)] = "AML" cancer[histo3 %in% c(9863, 9875)] = "CML" cancer[(histo3 == 9866)] = "APL" cancer[histo3 %in% c(9865, 9869, 9871, 9896, 9897, 9911)] = "AMLti" cancer[(histo3 > 9979) & (histo3 < 9990)] = "MDS" cancer[(histo3 == 9940)] = "HCL" cancer[(histo3 == 9945)] = "CMML" cancer[(histo3 == 9960)] = "MPN" cancer[(histo3 == 9975)] = "MPN" cancer[(histo3 == 9946)] = "MPN" cancer[(histo3 == 9950)] = "MPN" cancer[(histo3 == 9961)] = "MPN" cancer[(histo3 == 9962)] = "MPN" cancer[(histo3 == 9963)] = "MPN" cancer[(histo3 == 9964)] = "MPN" cancer[(histo3 >= 9965) & (histo3 <= 9967)] = "MPN" cancer[(histo3 >= 9970) & (histo3 <= 9971)] = "NHL" cancer[(histo3 == 9876)] = "MPN" cancer[histo3 == 9140] = "KS" cancer[(D$seqnum >= 60) & (D$seqnum <= 88)] = "benign" D$cancer = as.factor(cancer) D } <bytecode: 0x7fa67d68b558>

On Thu, Apr 16, 2020 at 9:03 PM zhangyz1997 notifications@github.com wrote:

I have received a reply from SEER official support team concerning this problem

SEER does not collect cases using ICD codes – we currently code cases using ICD-O-3. So, the variable “Recode ICD-O-2 to 9” requires the use of several coding conversions -- first convert to ICD-O-2 then to ICD9. The use of these backwards conversions will necessarily lead to some loss of accuracy/detail. We have actually removed this data item from the November 2019 data due to confusion over this. In general, we would recommend using the variable “Site recode ICD-O-3/WHO 2008” to select cases, unless you need a more granular selection, in which case you can work with ICD-O-3 histology type, behavior and primary site directly

Therefore, maybe it's a better idea to map cancers according to ' primsite' instead of 'ICD9'?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radivot/SEERaBomb/issues/6#issuecomment-614976498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS6KT557SIENNO64EK72ITRM6THDANCNFSM4MIPN54A .