hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
196 stars 59 forks source link

ACTIN-824: Add "Presence of tumor stats" to quality control page #597

Closed kzuberihmf closed 3 months ago

kzuberihmf commented 3 months ago

Adds tumorStats to orange-datamodel under purple, and corresponding code to populate in orange.

The orange pdf will have the tumor stats in a new table in the quality control chapter.

One of the attributes required the purple segments.tsv file which is not currently loaded, and that doesn't have a loader in common. I added support to extract just the minimal fields required to orange.

ninajacobs commented 3 months ago

See some comments in Slack :)

kzuberihmf commented 3 months ago

The purple constants don't seem to be available in common, so I tried the minimal version of the table without the thresholds listed. Also edited for your suggested wording @ninajacobs , below is the updated table on COLO. Happy to tweak further as needed.

For the convert-germline-to-somatic case I've added in the germline hotspot mutations to the calculation. And the tumor max diploid proportion is included, extracted from the purple purity.tsv. Also updated the comparison scripts in data-vm experiments for these changes.

Screenshot 2024-08-01 at 8 51 21 AM
ninajacobs commented 3 months ago

Hi @kzuberihmf, nice work thanks, will be a helpful table!

I'd propose to still add the subtitle: "At least one requirement needs to be met to indicate detection of tumor, in case of highly diploid candidate solutions (proportion >=0.97)" in which ideally 0.97 should be derived from "HIGHLY_DIPLOID_PERCENTAGE_DEFAULT" in PurpleConstants to make it future-safe?

And maybe make the table content grey in case tumor maximum diploid proportion < HIGHLY_DIPLOID_PERCENTAGE_DEFAULT ?

And I'd propose to change the title to "Tumor detection requirements" after all.

Finally, I'd say 'proportion' is a value like 0.02 instead of 2% (so propose to use 0.02).

kzuberihmf commented 3 months ago

Some other things I learned from Korneel to keep in mind to not forget :)

  • Updating the README (Making an upcoming section)
  • Update the COLO829 test reports by running ReportTestGeneratorApplication (WGS = Tumor_only=false, Targeted = Tumor_only=true)
  • You could also use that script to update real.orange.json (see comment in script). That way the “samplingDate” is adjusted to the current date.

Thanks, I've updated these. Though we'll likely update the report layout a bit more.

kzuberihmf commented 3 months ago

I'd propose to still add the subtitle: "At least one requirement needs to be met to indicate detection of tumor, in case of highly diploid candidate solutions (proportion >=0.97)" in which ideally 0.97 should be derived from "HIGHLY_DIPLOID_PERCENTAGE_DEFAULT" in PurpleConstants to make it future-safe?

And maybe make the table content grey in case tumor maximum diploid proportion < HIGHLY_DIPLOID_PERCENTAGE_DEFAULT ?

Added this, with the value hard-coded. I'm not sure how to do subtitles (or grey tables) so for the moment its on a separate line but in the same font as the heading.

And I'd propose to change the title to "Tumor detection requirements" after all.

Finally, I'd say 'proportion' is a value like 0.02 instead of 2% (so propose to use 0.02).

Updated.

cbruel commented 3 months ago

I'd propose to still add the subtitle: "At least one requirement needs to be met to indicate detection of tumor, in case of highly diploid candidate solutions (proportion >=0.97)" in which ideally 0.97 should be derived from "HIGHLY_DIPLOID_PERCENTAGE_DEFAULT" in PurpleConstants to make it future-safe? And maybe make the table content grey in case tumor maximum diploid proportion < HIGHLY_DIPLOID_PERCENTAGE_DEFAULT ?

Added this, with the value hard-coded. I'm not sure how to do subtitles (or grey tables) so for the moment its on a separate line but in the same font as the heading.

And I'd propose to change the title to "Tumor detection requirements" after all. Finally, I'd say 'proportion' is a value like 0.02 instead of 2% (so propose to use 0.02).

Updated.

I pushed a commit which makes the subtitle and reduces the distance between the title and table a bit, because it was quite large :)

kzuberihmf commented 3 months ago

I pushed a commit which makes the subtitle and reduces the distance between the title and table a bit, because it was quite large :)

Great, thanks! I was afraid to dig into figuring out how pdf formatting worked 😄

kzuberihmf commented 3 months ago

Thanks for the detailed feedback everyone, here and via email. I've updated to:

Also updated the test jar on data-vm experiments for testing verification. Hope I haven't missed anything 😄

ninajacobs commented 3 months ago

I pushed a minor capital letter change to ACTIN-824. I also did some testing - no unexpectencies found (will Slack you about details).

cbruel commented 3 months ago

I don't have any more comments :) (only removed the code for adding the subtitle, since it wasn't used anymore)