imsweb / algorithms

Java implementation of cancer-related algorithms (NHIA, NAPIIA, Survival Time, etc...)
Other
6 stars 6 forks source link

The length of FIELD_IARC_MP_HIST_GROUP is incorrect #109

Closed howew closed 4 years ago

howew commented 4 years ago

In Algorithms.java, a 2 is entered line below but the algorithm produces a field that is 4 characters wide. The 2 should therefore be a 4.

addField(AlgorithmField.of(FIELD_IARC_MP_HIST_GROUP, null, 2, "IARC Multiple Primary Histology Group", "IARC MP Hist Grp", DATA_LEVEL_TUMOR));

depryf commented 4 years ago

This is obviously a trivial fix.

How badly is this needed? Does this cause some immediate issues in NAACCR*Prep?

howew commented 4 years ago

It's low priority. I only spotted it because I based the NAACCRPrep dictionary on this documentation, so when NAACCRPrep was outputting a fixed-width record with this field it was wrong (values were shifted by 2 characters). I fixed the dictionary in Prep and it solved the issue.

depryf commented 4 years ago

Sounds good. I will put a fix soon, but won't push a release yet.

depryf commented 4 years ago

I am not seeing this.

IarcUtils.calculateHistGroup() returns a value between 1 and 17.

howew commented 4 years ago

Is the API calling the right function?

depryf commented 4 years ago

I think so.

This algorithm outputs two histology fields: "histology" (4 characters) and "histology group" (2 characters).

I reviewed the code in Algorithms and in IarcUtils and I don't see anything suspicious.

howew commented 4 years ago

Let me double check - I know that I was getting 4 characters back and when I updated the dictionary it solved the issue with the shifting.

depryf commented 4 years ago

Sorry, I found it.

That method default to returning the full histology if the 1-17 recoding doesn't work.

That does look super suspicious to me; sometimes it returns a legit 4-characters histology and sometimes it returns a 1 to 2 digit recoded code.

I would love to go back to the original SAS program and double-check that.

But as things are now, you are correct, that needs to be a 4 digits code.

I will fix that.

howew commented 4 years ago

OK sounds good.

depryf commented 4 years ago

@howew I went back to the original SAS program that used to compute this variable, and it didn't default the group to the full histology.

I believe this is a bug in the Java implementation.

If the full histology can't be coded into a group, then the method should return a null group.

That's the fix I will implement, instead of changing the length of the field.

howew commented 4 years ago

Do you think it should be fixed now or wait? I told Recinda it could wait but I can backtrack if you want.

depryf commented 4 years ago

I would rather wait. This doesn't feel like a very urgent issue to me.

The fix will be done soon regardless (like today or tomorrow) because it's trivial. But I wasn't going to push a full release just for this.

But I can if it causes any real issues in other software.

depryf commented 4 years ago

I think you can leave your temp fix in for now (using 4 characters). And the when you upgrade the library with this fix, you will have to undo that temp fix. If that works for you.

howew commented 4 years ago

Yes that's fine. I'll put a squish issue into the CFD squish project so that I don't forget.

depryf commented 4 years ago

I found another minor issue while looking at the code vs SAS. The site group is supposed to be the three first digits of the primary site. And the method that does that site group always returns 3 characters. But the site group field is defined as a 4 characters field. I am going to change it to 3 characters to make it clear it can only hold the 3 first characters of the primary site.

howew commented 4 years ago

OK. Noted.

depryf commented 4 years ago

This is completed. It will be available the next time a release is pushed (that will be version 2.19).