imsweb / algorithms

Java implementation of cancer-related algorithms (NHIA, NAPIIA, Survival Time, etc...)
Other
6 stars 6 forks source link

Survival Time Algorithm Bug #63

Closed flynngr closed 5 years ago

flynngr commented 5 years ago

See the attached for a case that is being calculated incorrectly. I have no idea why or why we haven't seen more like it, it doesn't seem THAT unusual.

I included all algorithm inputs for all fields, the algorithm outputs for the problematic fields and what the correct values should be. I'm using version 2.1.

We can talk in person if that is helpful. The general idea is the the seq 60 case should be sorted first since it's clearly before the seq 1 case, both dx dates are known. Then, since we know the seq 2 case should come after the seq 1 case (due to sequence), it's missing dates should be imputed half way between the diagnosis date of seq 1 and the date of last contact since it's the same year and there are no other records. Does that make sense?

If the date and order get set properly, it will end up changing the flags and the survival months. Let me know if you have any questions. Bad.Survival.Case.xlsx

bekeles commented 5 years ago

I was mentioning this problem for @depryf few weeks ago. @flynngr all the things you mentioned makes sense. But here is the problem: 1) comparing 1 and 2 -> 1 comes first based on sequence number 2) comparing 1 and 3 -> 3 comes first based on dx date 3) comparing 2 and 3 -> 2 comes first based on seq number

So it can be either 1 2 3 or 3 1 2.

flynngr commented 5 years ago

If we were doing a comparison in a vacuum (no other tumors), then your number 3 below would come out the way you say.

However, we have known information that the Seq 60 tumor came before the Seq 1 tumor (and therefore the Seq 2 tumor).

We never want the record number recode to not align with known tumor dates – so a tumor with known diagnosis of 2/23/15 should never have a record number higher than an tumor with known dx 3/5/15.

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 17, 2019 9:42 AM To: imsweb/algorithms algorithms@noreply.github.com Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com; Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

I was mentioning this problem for @depryfhttps://github.com/depryf few weeks ago. @flynngrhttps://github.com/flynngr all the things you mentioned makes sense. But here is the problem:

  1. comparing 1 and 2 -> 1 comes first based on sequence number
  2. comparing 1 and 3 -> 3 comes first based on dx date
  3. comparing 2 and 3 -> 2 comes first based on seq number

So it can be either 1 2 3 or 3 1 2.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVGNTLAIGPH3IEVCCV3P74OTDA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2EHNMI#issuecomment-512259761, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVEQUF6UWHESXH3FOYLP74OTDANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

bekeles commented 5 years ago

This may not happen in a real life. But I used the following records to extensively test the corner cases of the ordering in my unit test.

  1. 9999/99/99, 00
  2. 2013/01/02, 01
  3. 2013/01/02, 02
  4. 2010/03/01, 61
  5. 2010/99/99, 99
  6. 9999/99/99, 03
  7. 2009/11/11, 04
  8. 9999/99/99, 05
  9. 9999/99/99, 06
  10. 2008/03/01, 07
  11. 2010/01/01, 09
  12. 2011/01/02, 08

How do you think these should be ordered?

flynngr commented 5 years ago

Your example is pretty problematic -- we usually work with a basic understanding of the validity of sequence numbers (e.g. Seq 4 is not clearly after Seq 7, we don’t have both 0 and 1, etc.). The original case in question did not have invalid sequence numbers. The below case would fail edits for sure (I hope).

However, I do understand that you want to be as generic as possible. So, here is Steve’s response:

I don’t know if in SAS I handle the sequence # errors such that a person has a 0 and 1-59. I also might assume that if they have any cases that are not 99, then they can’t have any that are 99. Also, I’m not sure which should have priority date or sequence number when there is a clear contraction. But going with date having priority, I guess I would go with:

  1. 9999/99/99, 00
  2. 2008/03/01, 07
  3. 2009/11/11, 04
  4. 2010/01/01, 09
  5. 2010/99/99, 99
  6. 2010/03/01, 61
  7. 2011/01/02, 08
  8. 2013/01/02, 01
  9. 2013/01/02, 02
  10. 9999/99/99, 03
  11. 9999/99/99, 05
  12. 9999/99/99, 06 But the last 3 are really questionable. I’m just going with them in order (3,5,6), and then putting them after 2 (as it was the closest seq below the lowest one). But I can’t really defend that over putting them somewhere else. I just think anytime there is a true conflict between dates and sequence number are a problem.

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 17, 2019 11:21 AM To: imsweb/algorithms algorithms@noreply.github.com Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com; Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

This may not happen in a real life. But I used the following records to extensively test the corner cases of the ordering in my unit test.

  1. 9999/99/99, 00
  2. 2013/01/02, 01
  3. 2013/01/02, 02
  4. 2010/03/01, 61
  5. 2010/99/99, 99
  6. 9999/99/99, 03
  7. 2009/11/11, 04
  8. 9999/99/99, 05
  9. 9999/99/99, 06
  10. 2008/03/01, 07
  11. 2010/01/01, 09
  12. 2011/01/02, 08

How do you think these should be ordered?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVA5FNZHUUV4I2BF2Z3P742E5A5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2EXCZI#issuecomment-512323941, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVFTDWPYGNUC4NJI2J3P742E5ANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

depryf commented 5 years ago

@bekeles I will need to release the library soon because of another bug in the site-specific surgery tables.

When you do think these changes will be in? Are you still waiting on some information from other folks?

bekeles commented 5 years ago

I don't think it will be soon. I have to write our own sorting algorithm and I am still not sure of few things. I have to check few things with @flynngr before I start working on it.

depryf commented 5 years ago

Sounds good. Then I will make a release with that other change and I won't wait on this one. Thanjks.

flynngr commented 5 years ago

You probably should check with Scoppa on the details, as he wrote the SAS that we are trying to mimic.

-------- Original message -------- From: bekeles notifications@github.com Date: 7/25/19 1:29 PM (GMT-06:00) To: imsweb/algorithms algorithms@noreply.github.com Cc: "Flynn, Gretchen (IMS)" FlynnG@imsweb.com, Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

I don't think it will be soon. I have to write our own sorting algorithm and I am still not sure of few things. I have to check few things with @flynngrhttps://github.com/flynngr before I start working on it.

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVGFM5MD6D5G6T6UORDQBHWHRA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD22L3XA#issuecomment-515161564, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVBULL6I2M5KULL6N7TQBHWHRANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

bekeles commented 5 years ago

I want to test this in SEER*DMS. Since we ran the survival time polisher after November 1st, I am expecting this not to cause any change. This issue is done only for corner cases. Since I changed the sorting algorithm, I also want to see how significant the time to run the polisher has changed.

depryf commented 5 years ago

Do you want me to make a snapshot release using this branch?

bekeles commented 5 years ago

I will do it. Thanks

bekeles commented 5 years ago

I ran the polisher in over 1 million patient sets and only 5 of them are modified. I reviewed all those cases and they are similar to Gretchen's example above. Here is an example: CTC 01, 1987/01/14 CTC 02, 1989/02/27 CTC 03. 1990/10/22 CTC 61, 9999/99/99 CTC 62, 1989/02/27 After I ran the polisher CTCs were sorted as 01,02,62,03,61

@flynngr does this sound right to you?

flynngr commented 5 years ago

Steve, can you comment on this? I think we want 61 to come before 02 (and possibly 01?). It might depend on birthdate?

I'm not really sure that imputing unknown year half way between birth and a later tumor is statistically correct (childhood cancer being rarer than older cancer), but I think it's how we'd want it, right?

-------- Original message -------- From: bekeles notifications@github.com Date: 7/30/19 7:47 AM (GMT-05:00) To: imsweb/algorithms algorithms@noreply.github.com Cc: "Flynn, Gretchen (IMS)" FlynnG@imsweb.com, Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

I ran the polisher in over 1 million patient sets and only 5 of them are modified. I reviewed all those cases and they are similar to Gretchen's example above. Here is an example: CTC 01, 1987/01/14 CTC 02, 1989/02/27 CTC 03. 1990/10/22 CTC 61, 9999/99/99 CTC 62, 1989/02/27 After I ran the polisher CTCs were sorted as 01,02,62,03,61

@flynngrhttps://github.com/flynngr does this sound right to you?

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVAEXUL3ATCWWSRSDYTQCAS3RA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3DWPQY#issuecomment-516384707, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVC4QHOSLEBWFEL5KKLQCAS3RANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

flynngr commented 5 years ago

The intent of the algorithm never had any imputation of year. It specifically states in the assumptions that there are no unknown years for dx or fu. But if we need to make it work with everything, then I would go with 61 between 2 and 62. It is supposed to put 60+ after 0-59 if there is no way to determine that it came before it. The unknown date doesn't make it come before 01 or 02. The sequence # does make it come before 62. I definitely do not think we should pick a year in the middle of the time from birth to the next DX as that would very easily impute a date of dx prior to the start of the cancer registry.

Regardless of what the sort ends up being, the survival fields should be 9 filled if the year was unknown.

From: Flynn, Gretchen (IMS) Sent: Tuesday, July 30, 2019 9:13 AM To: imsweb/algorithms reply@reply.github.com; imsweb/algorithms algorithms@noreply.github.com; Scoppa, Steve (IMS) ScoppaS@imsweb.com Cc: Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

Steve, can you comment on this? I think we want 61 to come before 02 (and possibly 01?). It might depend on birthdate?

I'm not really sure that imputing unknown year half way between birth and a later tumor is statistically correct (childhood cancer being rarer than older cancer), but I think it's how we'd want it, right?

-------- Original message -------- From: bekeles notifications@github.com<mailto:notifications@github.com> Date: 7/30/19 7:47 AM (GMT-05:00) To: imsweb/algorithms algorithms@noreply.github.com<mailto:algorithms@noreply.github.com> Cc: "Flynn, Gretchen (IMS)" FlynnG@imsweb.com<mailto:FlynnG@imsweb.com>, Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

I ran the polisher in over 1 million patient sets and only 5 of them are modified. I reviewed all those cases and they are similar to Gretchen's example above. Here is an example: CTC 01, 1987/01/14 CTC 02, 1989/02/27 CTC 03. 1990/10/22 CTC 61, 9999/99/99 CTC 62, 1989/02/27 After I ran the polisher CTCs were sorted as 01,02,62,03,61

@flynngrhttps://github.com/flynngr does this sound right to you?

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVAEXUL3ATCWWSRSDYTQCAS3RA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3DWPQY#issuecomment-516384707, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVC4QHOSLEBWFEL5KKLQCAS3RANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

bekeles commented 5 years ago

I don't know how I can accomplish this programmatically. I can guarantee you the SAS program won't give you what you want for the first issue you reported and for this example I provided. We have to come up with a standard rule. The rule I used to solve the issue you reported was:

  1. Order the known dates based on dx date first.
  2. Then compare the unknown dates based on sequence number starting from the latest dx record. I can change this to start comparing the unknown dates with the earliest dx. In this case the ordering for the records you reported will end up like this: 2015/99/99 - 02 2015/2/23 - 60 2015/3/5 - 01
flynngr commented 5 years ago

Steve and I just discussed this a bit.

The SAS program was never intended to impute the order of records with unknown years. It states in the comments that it doesn’t work with unknown years of diagnosis or follow up.

The latter solution suggested in #2 would not be correct – when dates are unknown, sequence number would be considered in the sorting, so we definitely want the 02 case after the 01. This is part of the algorithm currently and those cases are considered ‘good’ data that we would be coding this for.

Regarding the unknown years, we think it would be best/easiest to just automatically 9-fill all of the fields (including record number recode). If it’s dx year that is unknown, just the fields for that record. If follow-up year is unknown it should be all of the fields for all of the records. We don’t think we should waste much more time worrying about these cases that will never be used for analysis. These fields are for analysis.

Does that sound ok?

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 31, 2019 9:30 AM To: imsweb/algorithms algorithms@noreply.github.com Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com; Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

I don't know how I can accomplish this programmatically. I can guarantee you the SAS program won't give you what you want for the first issue you reported and for this example I provided. We have to come up with a standard rule. The rule I used to solve the issue you reported was:

  1. Order the known dates based on dx date first.
  2. Then compare the unknown dates based on starting from the latest dx record. I can change this to start comparing the unknown dates with the earliest dx. In this case the ordering for the records you reported will end up like this: 2015/99/99 - 02 2015/2/23 - 60 2015/3/5 - 01

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVBKPRBTNIMWDBS7JYLQCGHVTA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3HH7EA#issuecomment-516849552, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVAHSBFD3M6K47ZBERLQCGHVTANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

bekeles commented 5 years ago

That sounds ok. But we still need to figure out how to handle the unknown months and days. Consider the following: CTC 01, 1987/10/14 CTC 61, 1987/99/99 CTC 62, 1987/05/27

In this case, do you want CTC61 before 62 (CTC61, CTC62, CTC01) or After 01 (CTC62, CTC01, CTC61)? What I implemented currently is the later.

flynngr commented 5 years ago

Steve – please correct me if I’m wrong, but 61 should be half way between 1/1/87 and 5/27/87. The record numbers should be 3, 1 , 2.

Sequence numbers in the 60’s should be essentially considered separately from 0-59. Absent of known dates, we have no reason to think that 61 should be after 01. But we do have reason to think that 61 is before 62 (i.e. the sequence number).

Hopefully Steve will step in and explain this better.

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 31, 2019 12:59 PM To: imsweb/algorithms algorithms@noreply.github.com Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com; Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

That sounds ok. But we still need to figure out how to handle the unknown months and days. Consider the following: CTC 01, 1987/10/14 CTC 61, 1987/99/99 CTC 62, 1987/05/27

In this case, do you want CTC61 before 62 (CTC61, CTC62, CTC01) or After 01 (CTC62, CTC01, CTC61)? What I implemented currently is the later.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVCDHF7NUG5USEQ5NK3QCHADVA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3H4QKI#issuecomment-516933673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVE4C5FWKPNAKK7T32DQCHADVANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

bekeles commented 5 years ago

Sorry to bother you, but I am trying to come up with a logic that works for all cases. What if the records sequence number is not in the 60's at all like this: CTC 01, 1987/10/14 CTC 02, 1987/99/99 CTC 03, 1987/05/27 CTC02 can be before 03 or after 01

flynngr commented 5 years ago

You are not bothering at all. This is a complicated and confusing algorithm.

I would wait until Steve chimes in on this (which may not be until next week, as he is going away today). But here are my thoughts….

Steve and I did encounter a similar case a few weeks ago, and I THINK that the answer is going to be “do it whichever way you want that adheres to your other rules.” I can tell you based on the case we looked at a few weeks ago that I’m pretty sure that the SAS code puts it after 01. This case does not concern us as much as the original one in this issue, as this case would not be allowed in our analysis data (it should have failed an edit).

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 31, 2019 1:16 PM To: imsweb/algorithms algorithms@noreply.github.com Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com; Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

Sorry to bother you, but I am trying to come up with a logic that works for all cases. What if the records sequence number is not in the 60's at all like this: CTC 01, 1987/10/14 CTC 02, 1987/99/99 CTC 03, 1987/05/27 CTC02 can be before 03 or after 01

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVD74HV32QAI6G5U2SDQCHCGLA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3H6D6Y#issuecomment-516940283, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVFQK3JGIT5TACTEXRTQCHCGLANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

flynngr commented 5 years ago

I agree, I don’t have a valid argument for either position being better as both cause a conflict. I do think that SAS would likely put it after 1 (so between 10/14/87 and 12/31/87).

Steve

From: Flynn, Gretchen (IMS) Sent: Wednesday, July 31, 2019 2:08 PM To: imsweb/algorithms reply@reply.github.com; imsweb/algorithms algorithms@noreply.github.com Cc: Mention mention@noreply.github.com; Scoppa, Steve (IMS) ScoppaS@imsweb.com Subject: RE: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

You are not bothering at all. This is a complicated and confusing algorithm.

I would wait until Steve chimes in on this (which may not be until next week, as he is going away today). But here are my thoughts….

Steve and I did encounter a similar case a few weeks ago, and I THINK that the answer is going to be “do it whichever way you want that adheres to your other rules.” I can tell you based on the case we looked at a few weeks ago that I’m pretty sure that the SAS code puts it after 01. This case does not concern us as much as the original one in this issue, as this case would not be allowed in our analysis data (it should have failed an edit).

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 31, 2019 1:16 PM To: imsweb/algorithms algorithms@noreply.github.com<mailto:algorithms@noreply.github.com> Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com<mailto:FlynnG@imsweb.com>; Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

Sorry to bother you, but I am trying to come up with a logic that works for all cases. What if the records sequence number is not in the 60's at all like this: CTC 01, 1987/10/14 CTC 02, 1987/99/99 CTC 03, 1987/05/27 CTC02 can be before 03 or after 01

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVD74HV32QAI6G5U2SDQCHCGLA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3H6D6Y#issuecomment-516940283, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVFQK3JGIT5TACTEXRTQCHCGLANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

flynngr commented 5 years ago

I completely agree and don’t think I can explain it better. There will be no conflicts with Gretchen’s answer.

From: Flynn, Gretchen (IMS) Sent: Wednesday, July 31, 2019 1:06 PM To: imsweb/algorithms reply@reply.github.com; imsweb/algorithms algorithms@noreply.github.com Cc: Mention mention@noreply.github.com; Scoppa, Steve (IMS) ScoppaS@imsweb.com Subject: RE: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

Steve – please correct me if I’m wrong, but 61 should be half way between 1/1/87 and 5/27/87. The record numbers should be 3, 1 , 2.

Sequence numbers in the 60’s should be essentially considered separately from 0-59. Absent of known dates, we have no reason to think that 61 should be after 01. But we do have reason to think that 61 is before 62 (i.e. the sequence number).

Hopefully Steve will step in and explain this better.

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 31, 2019 12:59 PM To: imsweb/algorithms algorithms@noreply.github.com Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com; Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

That sounds ok. But we still need to figure out how to handle the unknown months and days. Consider the following: CTC 01, 1987/10/14 CTC 61, 1987/99/99 CTC 62, 1987/05/27

In this case, do you want CTC61 before 62 (CTC61, CTC62, CTC01) or After 01 (CTC62, CTC01, CTC61)? What I implemented currently is the later.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVCDHF7NUG5USEQ5NK3QCHADVA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3H4QKI#issuecomment-516933673, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVE4C5FWKPNAKK7T32DQCHADVANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

bekeles commented 5 years ago

One last question: CTC 01, 1987/10/14 CTC 61, 1987/05/27 CTC 62, 1987/99/99

In this case do you put CTC62 immediately after 61 (CTC61, CTC62, CTC01) or after CTC01 (CTC61, CTC01, CTC62)?

flynngr commented 5 years ago

This one should come out CTC 61, CTC 62, CTC 01. The date of CTC 62 should be set to 9/13/1987 (half way between 5/27 and 12/31). CTC 01 is not involved in the imputing of the date since we have no information about its ordering visa vi CTC 62.

Now, if the date for CTC 61 were 1987/09/20, the order would come out differently (61, 01, 62). Does that make sence?

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 31, 2019 3:12 PM To: imsweb/algorithms algorithms@noreply.github.com Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com; Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

One last question: CTC 01, 1987/10/14 CTC 61, 1987/05/27 CTC 62, 1987/99/99

In this case do you put CTC62 immediately after 61 (CTC61, CTC62, CTC01) or after CTC01 (CTC61, CTC01, CTC62)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVHGWSK2MA6TSFO6LTTQCHPZDA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IINJI#issuecomment-516982437, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVC57H5V3CNKCNT3KP3QCHPZDANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

bekeles commented 5 years ago

Now it is getting more complicated. And using only selected records instead of all valid records to fill the missing date is a new concept.

I am not really sure how to cover all this corner (5 in a million) cases without breaking the real logic. I actually would say it is impossible.

flynngr commented 5 years ago

Let’s hold off and get together in person next week to discuss options.

The SAS separates federal and state tumors completely before processing, which is how it would come up with this result. But don’t start tearing anything down until we talk as a group.

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 31, 2019 4:19 PM To: imsweb/algorithms algorithms@noreply.github.com Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com; Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

Now it is getting more complicated. And using only selected records instead of all valid records to fill the missing date is a new concept.

I am not really sure how to cover all this corner (5 in a million) cases without breaking the real logic. I actually would say it is impossible.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVFLRNB7S4SUEXNOMLDQCHXTPA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IN2AA#issuecomment-517004544, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVDZDCZPN45DXWJYPMDQCHXTPANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

flynngr commented 5 years ago

SAS does separate, but it brings that back before doing the dates. I think it should go the other way, 61,1,62. We know 62 is after 61, but we don’t know if it’s after 1 and in that case, we put 1 (0-59) before 62 (60+). So, I think it should go 61,1,62 – I assume that is what SAS would do – it is supposed to.

From: Flynn, Gretchen (IMS) Sent: Wednesday, July 31, 2019 4:24 PM To: imsweb/algorithms reply@reply.github.com; imsweb/algorithms algorithms@noreply.github.com Cc: Mention mention@noreply.github.com; Scoppa, Steve (IMS) ScoppaS@imsweb.com Subject: RE: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

Let’s hold off and get together in person next week to discuss options.

The SAS separates federal and state tumors completely before processing, which is how it would come up with this result. But don’t start tearing anything down until we talk as a group.

From: bekeles [mailto:notifications@github.com] Sent: Wednesday, July 31, 2019 4:19 PM To: imsweb/algorithms algorithms@noreply.github.com Cc: Flynn, Gretchen (IMS) FlynnG@imsweb.com; Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

Now it is getting more complicated. And using only selected records instead of all valid records to fill the missing date is a new concept.

I am not really sure how to cover all this corner (5 in a million) cases without breaking the real logic. I actually would say it is impossible.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVFLRNB7S4SUEXNOMLDQCHXTPA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IN2AA#issuecomment-517004544, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVDZDCZPN45DXWJYPMDQCHXTPANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

bekeles commented 5 years ago

The bug is fixed now. I used all those examples in this issue in my unit tests and I got the results we expected. @flynngr is there anyway you can test this using the snapshot release?

flynngr commented 5 years ago

Fabian - can you show or tell me how to use this snapshot in my program?

-------- Original message -------- From: bekeles notifications@github.com Date: 8/9/19 2:07 PM (GMT-05:00) To: imsweb/algorithms algorithms@noreply.github.com Cc: "Flynn, Gretchen (IMS)" FlynnG@imsweb.com, Mention mention@noreply.github.com Subject: Re: [imsweb/algorithms] Survival Time Algorithm Bug (#63)

The bug is fixed now. I used all those examples in this issue in my unit tests and I got the results we expected. I also ran the polisher in SEER*DMS DT beta and reviewed all the changes they all look good. @flynngrhttps://github.com/flynngr is there anyway you can test this using the snapshot release? @depryfhttps://github.com/depryf Can you please review the changes on DT beta? I created a filter called 'survival' in data search. I just wanted some other person to look at it and you are the only other person who involved on this.

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/imsweb/algorithms/issues/63?email_source=notifications&email_token=AKSSPVBTL6DPJPHUVEYNDDDQDWW5DA5CNFSM4IEJB5K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD37MHDI#issuecomment-520012685, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKSSPVF6V4KO26IUSSWC5N3QDWW5DANCNFSM4IEJB5KQ.


Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

depryf commented 5 years ago

You have to add the snapshot repository to your Gradle file:

repositories {
    mavenCentral()
    maven {
        url = 'https://oss.sonatype.org/content/repositories/snapshots/'
    }
}

And then you reference the snapshot version, which I think is 2.3-SNAPSHOT:

implementation 'com.imsweb:algorithms:2.3-SNAPSHOT'
flynngr commented 5 years ago

I have tested this using the November SEER data and I got the results that I expected, 2 patients changed: the one that started this issue and the one I mentioned the other day that I suspected would also change. They both match SAS now.

bekeles commented 5 years ago

Good! Thanks! I think this is ready to be released. I will create a merge request and let @depryf review the code.