hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
179 stars 56 forks source link

ACTIN-808: Add PurpleTranscriptImpact.reported flag to orange datamodel #543

Closed kzuberihmf closed 2 months ago

kzuberihmf commented 2 months ago

Hope this matches the intent discussed.

Let's suppose the next version is 3.5.0, with this change it won't be able to load orange-datamodel jsons from 3.4.0 since the reported field is required. Are we ok with this break in compatibility?

kduyvesteyn commented 2 months ago
  • PurpleTranscriptImpact gets reported boolean (can not be null)
  • PupleVariantFactory uses PurpleVariantContext.reportrableTranscripts() to set the reported flag for other impacts
  • PurpleVariant.reported is derived, and true if canonical or any other transcripts are reported

Hope this matches the intent discussed.

Let's suppose the next version is 3.5.0, with this change it won't be able to load orange-datamodel jsons from 3.4.0 since the reported field is required. Are we ok with this break in compatibility?

My understanding is that a new field doesn't break compatibility? The only thing we need to keep in mind is that we need to upgrade ORANGE in the pipeline before we bump the datamodel dependency in ACTIN (to make sure we produce the datamodel that ACTIN expects). Please double-check with Paul to be sure!

kzuberihmf commented 2 months ago

My understanding is that a new field doesn't break compatibility? The only thing we need to keep in mind is that we need to upgrade ORANGE in the pipeline before we bump the datamodel dependency in ACTIN (to make sure we produce the datamodel that ACTIN expects). Please double-check with Paul to be sure!

Tested this by creating an Orange json using this version on COLO and processing it through a recent version of ACTIN-Molecular and it ran without error. So seems for that direction (will also mention to Paul).

DFKoetsier commented 2 months ago

One additional thought: I suspect VariantEntryFactory in ORANGE can be improved thanks to the changes in the datamodel. This function constructs the actual variants as they are expected to appear on the report, and this is derived from drivers since variant objects themselves did not contain enough information.

With this change I suspect the VariantEntry list that is generated in the VariantEntryFactory can be derived solely from purple variants (potentially still need drivers for additional annotation but not as the source for variant entries... didn't think through completely though.

I'm pretty sure you'd still need the drivers for the driver likelihood. I do agree that it's probably possible to use them for only that.

kzuberihmf commented 2 months ago

Will merge this, and do VariantEntryFactory in a separate PR.

Shall I do a release for ORANGE and ORANGE datamodel for this now as well?