hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
179 stars 56 forks source link

DEV-3566 fix virus interpretation #458

Closed mvanniekerkHartwig closed 10 months ago

mvanniekerkHartwig commented 10 months ago

Removes the VirusInterpretation enum. During the orange-datamodel creation, I added this enum because it was part of the oncoact (patient-reporter/rose/protect) common module. In hindsight not a smart move, the VirusInterpretation enum was removed from hmftools originally all the way back in 2021 (see commit 3de88bcd). There seems to be no reason that this enum should exist, Paul already fixed a bug related to this same field in May (see commit 45319b5b).

mvanniekerkHartwig commented 10 months ago

Are you sure you added this in orange-datamodel creation?

Yes, in commit d58e42d9.

In any case, we use and rely on this field in ACTIN since nov 2022, and we explicitly use the various enum values. If we change this to a string we would map it back to enum in ACTIN, not sure if that's an improvement?

Agreed, that would not be ideal. On the other hand, nowhere in hmftools it is guaranteed that the values in VirusInterpretation will be the only values in the AnnotatedVirus.interpretation field. As long as this is not guaranteed on the side of the virus output, I'd argue relying on this enum is a bug. Another practical consideration is that it's currently much easier to update ACTIN than Orange. Patient Reporter has the same problem but mostly checks for virus.interpretation() == VirusInterpretation.HPV, and when changing this to the hardcoded String value "HPV" we probably don't loose out in code reliability too much.

kduyvesteyn commented 10 months ago

Are you sure you added this in orange-datamodel creation?

Yes, in commit d58e42d9.

In any case, we use and rely on this field in ACTIN since nov 2022, and we explicitly use the various enum values. If we change this to a string we would map it back to enum in ACTIN, not sure if that's an improvement?

Agreed, that would not be ideal. On the other hand, nowhere in hmftools it is guaranteed that the values in VirusInterpretation will be the only values in the AnnotatedVirus.interpretation field. As long as this is not guaranteed on the side of the virus output, I'd argue relying on this enum is a bug. Another practical consideration is that it's currently much easier to update ACTIN than Orange. Patient Reporter has the same problem but mostly checks for virus.interpretation() == VirusInterpretation.HPV, and when changing this to the hardcoded String value "HPV" we probably don't loose out in code reliability too much.

Ah I see now, didn't realize interpretation() is a string in VirusInterpreter. Maybe the best solution would be to make this an enum in VirusInterpreter and propagate that. Downstream tools should be able to know the list of potential viruses assigned after interpretation, and not have to guess it.

In similar situations we added extra enums all the way rather than remove "wrong" ones. E.G. regionType and codingType in LinxBreakend were both changed from String to enum after realising downstream tools need to be able to know what types could be assigned.

kduyvesteyn commented 10 months ago

Do note: If you decide to make interpretation a string all-the-way then feel free to merge (code changes themselves look good, and do indeed solve an inconsistency in ORANGE)

mvanniekerkHartwig commented 10 months ago

In similar situations we added extra enums all the way rather than remove "wrong" ones. E.G. regionType and codingType in LinxBreakend were both changed from String to enum after realising downstream tools need to be able to know what types could be assigned.

I did a small investigation to see what it would take to go all the way. Ultimately, the AnnotatedVirus.interpretation field is set by a CSV file in virus interpreter that is passed in as an input file through the command line arg virus_reporting_db_tsv. It seems doable, so I made the changes. No complete guarantee of correctness, because now the change is a bit more sophisticated. But pretty high confidence it should work.

mvanniekerkHartwig commented 10 months ago

One remaining question by @kduyvesteyn : Should we release an update for Cuppa as well after this change? The cuppa part of this update is fairly minor, and after David's comments, hopefully bugfree. @charlesshale do you have any preference?