QC tooltip on case metadata header

mskilab-org / case-report

Genomic case report

BSD 3-Clause "New" or "Revised" License

0 stars 0 forks source link

QC tooltip on case metadata header #2

Closed shihabdider closed 3 weeks ago

shihabdider commented 1 month ago

This feature involves adding a QC tooltip to the case metadata header.

Feature Details:

Tooltip Design:
The tooltip should be consistent with existing tooltips in the header:
Icon Indication:
An icon should be associated with the tooltip to indicate the quality status of the case. This icon will reflect whether the case meets predefined thresholds for QC metrics. @xanthopoulakis feel free to use defaults provided by ant.design.
Data Source:
QC metrics will be sourced from the metadata.json file under the attribute bam_qc_metrics, the value will be an object containing various QC metrics. @shihabdider will provide updated metadata.jsons for the clinical samples by end of Tues Oct 15.
Threshold Specification:
@kevinmhadi will provide the specification of the QC metrics object's attributes and their respective thresholds (in this issue thread) to indicate bad samples. For now these will be flat comparisons, later we can decide on more sophisticated criteria for flagging.
Distribution Plot:
@kevinmhadi will also specify which attributes should be aggregated across patients to generate distribution plots for them.
JaBbA QC: Not sure how we this information to be presented. One possibility is just as an additional (optional) attribute of this qc tooltip (in which case we can rename the parent attribute to sample_qc_metrics and include it as part of the metadata.json). Alternatively we can try to incorporate it into the JaBbA plot component somehow (maybe an indicator next to the title that doubles as a tooltip and only appears when there's something wrong?). @jrafailov if you have some other ideas please share in this thread. In particular, we need to decide what file to put this information (or introduce a new file) that will be fetched by the case-reports front-end.

kevinmhadi commented 1 month ago

Threshold Specification: @kevinmhadi will provide the specification of the QC metrics object's attributes and their respective thresholds (in this issue thread) to indicate bad samples. For now these will be flat comparisons, later we can decide on more sophisticated criteria for flagging.

These values will be subject to change, as many final sequencing specs aren't even decided on yet. But I could see flagging might have 2 levels where if any of the following are seen, these are either serious signs of bad sequencing ("danger") or less serious, but still concerning issues over data quality ("warning").

I think these are the stats where any of these issues should flag one of these two levels of issues (doesn't have to actually be named "danger" or "warning").

The below is just an example mapping of what the category is that would raise one of the two levels of severity and what a potential message should show in a tooltip, but I am imagining the UI element would end up on one of three options ("green light", "red light"/Danger, "yellow light"/Warning). If there's a mixture, the UI element should defer to the most serious level.

Danger

{
    "Tumor Coverage < 80X": "Tumor bam has low coverage: 23X",
    "Normal Coverage < 20X": "Normal bam has low coverage: 10X",
    "Mapped Reads / Total Reads <99%": "High percentage of reads not mapping to hg19: X%",
    "Mapped HQ (>=Q20 Reads) / Total Reads <80%": "High percentage of reads not mapping to hg19: X%",
    "Genome Coverage at least 30X (PCT_30X, Picard) is less than 95%":  "Insufficient coverage of genome at 30X: X%",
    "AMBER Contamination > 1%": "Potential sample contamination detected!"
}

Warning

{
    "Median Insert Size <= 300": "Short median insert sizes detected: sensitivity to structural variants may be reduced.",
    "Optical/PCR Dups >= 20%": "High duplication rate X%",
    "JaBbA: Number of Loose Ends >= 100": "Sample potentially hypersegmented",
    "JaBbA: Converged == FALSE": "Graph model did not converge, PGV CN solution must be interpreted with caution"
}

kevinmhadi commented 1 month ago

Distribution Plot: @kevinmhadi will also specify which attributes should be aggregated across patients to generate distribution plots for them.

Tumor coverage, normal coverage, fraction of genomc covered at least 30%, insert size, dup rate, and number of segments (from jabba) make sense as some QC distributions to start with. I think these will be important to track over a cohort as the sample size grows and also will help us come up with necessary thresholds in the future once we start locking in the sequencing protocol.

For now, the other values above are moreless numbers that should raise warnings - distribution is less important, imo.

shihabdider commented 1 month ago

Threshold Specification: @kevinmhadi will provide the specification of the QC metrics object's attributes and their respective thresholds (in this issue thread) to indicate bad samples. For now these will be flat comparisons, later we can decide on more sophisticated criteria for flagging.

These values will be subject to change, as many final sequencing specs aren't even decided on yet. But I could see flagging might have 2 levels where if any of the following are seen, these are either serious signs of bad sequencing ("danger") or less serious, but still concerning issues over data quality ("warning").

I think these are the stats where any of these issues should flag one of these two levels of issues (doesn't have to actually be named "danger" or "warning").

The below is just an example mapping of what the category is that would raise one of the two levels of severity and what a potential message should show in a tooltip, but I am imagining the UI element would end up on one of three options ("green light", "red light"/Danger, "yellow light"/Warning). If there's a mixture, the UI element should defer to the most serious level.

Danger
{
    "Tumor Coverage < 80X": "Tumor bam has low coverage: 23X",
    "Normal Coverage < 20X": "Normal bam has low coverage: 10X",
    "Mapped Reads / Total Reads <99%": "High percentage of reads not mapping to hg19: X%",
    "Mapped HQ (>=Q20 Reads) / Total Reads <80%": "High percentage of reads not mapping to hg19: X%",
    "Genome Coverage at least 30X (PCT_30X, Picard) is less than 95%":  "Insufficient coverage of genome at 30X: X%",
    "AMBER Contamination > 1%": "Potential sample contamination detected!"
}
Warning
{
  "Median Insert Size <= 300": "Short median insert sizes detected: sensitivity to structural variants may be reduced.",
  "Optical/PCR Dups >= 20%": "High duplication rate X%",
  "JaBbA: Number of Loose Ends >= 100": "Sample potentially hypersegmented",
  "JaBbA: Converged == FALSE": "Graph model did not converge, PGV CN solution must be interpreted with caution"
}

So one issue with this is most of the danger metrics you've mentioned here are included in a different tooltip (the coverage tooltip).

Should we move them to its own tooltip or leave them where they are? I would prefer the latter since they do concern coverage (n.b iirc we are now getting these from the output of estimatelibrarycomplexity, whereas the ones shown below were from qualimap). One solution is to have an indicator next to each tooltip which flags when one or more of its attributes crosses its respective threshold. Then, on hover, the tooltip highlights the offending attributes (e.g in a different color).

In fact, @xanthopoulakis this means you don't have to wait for me to update the metadata.json with a new field. You can use the existing fields below:

    "tumor_median_coverage": "112.0X",
    "normal_median_coverage": "33.0X",
    "coverage_qc": {
      "%_gc": "44%",
      "≥_30x": "91.7%",
      "≥_50x": "89.0%",
      "%_aligned": "99.9%",
      "insert_size": "261 bp",
      "m_reads": 2933.7,
      "m_reads_mapped": 2930.7,
      "%_reads_mapped": "99.9%",
      "read_pairs_examined": 84809241,
      "read_pair_duplicates": 17180775,
      "read_pair_optical_duplicates": 13246875,
      "percent_duplication": 0.2026,
      "percent_optical_duplication": 0.1562,
      "percent_optical_dups_of_dups": 0.771,
      "coverage_variance": 78.8458
    },

And here are the thresholds with names corresponding to the above:

Danger

"tumor_median_coverage < 80X": "Tumor bam has low coverage: 23X",
"normal_median_coverage < 20X": "Normal bam has low coverage: 10X",
"%_reads_mapped < 99%": "High percentage of reads not mapping to reference: X%",
"≥_30x < 95%": "Insufficient coverage of genome at 30X: X%"

Warning

"insert_size <= 300": "Short median insert sizes detected: sensitivity to structural variants may be reduced.",
"percent_duplication >= 20%": "High duplication rate X%"

One way to implement this is to have a separate thresholds object that defines the thresholds for each attribute. There are some type issues here as some of the values are strings that will need to be converted into numbers/floats. To flag the attributes I think a change in color will be sufficient, I'm not sure we need a separate message for it (@kevinmhadi what do you think?).

For one's not included in the above, I'll add them to the metadata.json. For now you can get started on adding the indicator to the coverage tooltip.

kevinmhadi commented 1 month ago

I think keeping the info in their respective tooltips is good.

What I'm thinking though is like a single UI element that's added somewhere (maybe the leftmost header element, scooting the other metrics down) that visually marks whether the sample is severely problematic, passes with a warning, or passes everything. I'm thinking it should gather the offending metrics inside a tooltip that says what the problem is with some thresholds.

That's just one option. Basically my thought is just to unify any problematic flags in one place that's obvious to the user.

Of course if the team has a better way to represent this we can brainstorm other ideas.

I realize there'll be some redundancy here but these issues will probably come from heterogeneous sources (picard qc, amber, jabba output, and potentially other pipeline outputs eventually).

Lmk what you all think

On Oct 15, 2024 at 9:42 AM, <Shihab Dider @.***)> wrote:

Threshold Specification: @kevinmhadi (https://github.com/kevinmhadi) will provide the specification of the QC metrics object's attributes and their respective thresholds (in this issue thread) to indicate bad samples. For now these will be flat comparisons, later we can decide on more sophisticated criteria for flagging.

These values will be subject to change, as many final sequencing specs aren't even decided on yet. But I could see flagging might have 2 levels where if any of the following are seen, these are either serious signs of bad sequencing ("danger") or less serious, but still concerning issues over data quality ("warning").

I think these are the stats where any of these issues should flag one of these two levels of issues (doesn't have to actually be named "danger" or "warning").

The below is just an example mapping of what the category is that would raise one of the two levels of severity and what a potential message should show in a tooltip, but I am imagining the UI element would end up on one of three options ("green light", "red light"/Danger, "yellow light"/Warning). If there's a mixture, the UI element should defer to the most serious level.

Danger

{ "Tumor Coverage < 80X": "Tumor bam has low coverage: 23X", "Normal Coverage < 20X": "Normal bam has low coverage: 10X", "Mapped Reads / Total Reads <99%": "High percentage of reads not mapping to hg19: X%", "Mapped HQ (>=Q20 Reads) / Total Reads <80%": "High percentage of reads not mapping to hg19: X%", "Genome Coverage at least 30X (PCT_30X, Picard) is less than 95%": "Insufficient coverage of genome at 30X: X%", "AMBER Contamination > 1%": "Potential sample contamination detected!" }

Warning

{ "Median Insert Size <= 300": "Short median insert sizes detected: sensitivity to structural variants may be reduced.", "Optical/PCR Dups >= 20%": "High duplication rate X%", "JaBbA: Number of Loose Ends >= 100": "Sample potentially hypersegmented", "JaBbA: Converged == FALSE": "Graph model did not converge, PGV CN solution must be interpreted with caution" }

So one issue with this is most of the danger metrics you've mentioned here are included in a different tooltip (the coverage tooltip).

image.png (view on web) (https://github.com/user-attachments/assets/8e3cf626-bf0c-471d-8326-877c8672ec14)

Should we move them to its own tooltip or leave them where they are? I would prefer the latter since they do concern coverage (n.b iirc we are now getting these from the output of estimatelibrarycomplexity, whereas the ones shown below were from qualimap). One solution is to have an indicator next to each tooltip which flags when one or more of its attributes crosses its respective threshold. Then, on hover, the tooltip highlights the offending attributes (e.g in a different color).

In fact, @xanthopoulakis (https://github.com/xanthopoulakis) this means you don't have to wait for me to update the metadata.json with a new field. You can use the existing fields below:

"tumor_median_coverage": "112.0X", "normal_median_coverage": "33.0X", "coverage_qc": { "%_gc": "44%", "≥_30x": "91.7%", "≥_50x": "89.0%", "%_aligned": "99.9%", "insert_size": "261 bp", "m_reads": 2933.7, "m_reads_mapped": 2930.7, "%_reads_mapped": "99.9%", "read_pairs_examined": 84809241, "read_pair_duplicates": 17180775, "read_pair_optical_duplicates": 13246875, "percent_duplication": 0.2026, "percent_optical_duplication": 0.1562, "percent_optical_dups_of_dups": 0.771, "coverage_variance": 78.8458 },

And here are the thresholds with names corresponding to the above:

Danger

"tumor_median_coverage < 80X": "Tumor bam has low coverage: 23X", "normal_median_coverage < 20X": "Normal bam has low coverage: 10X", "%_reads_mapped < 99%": "High percentage of reads not mapping to reference: X%", "Mapped HQ (>=Q20 Reads) / Total Reads <80%": "High percentage of reads not mapping to hg19: X%", "≥_30x < 95%": "Insufficient coverage of genome at 30X: X%"

Warning

"insert_size <= 300": "Short median insert sizes detected: sensitivity to structural variants may be reduced.", "percent_duplication >= 20%": "High duplication rate X%"

One way to implement this is to have a separate thresholds object that defines the thresholds for each attribute. There are some type issues here as some of the values are strings that will need to be converted into numbers/floats. To flag the attributes I think a change in color will be sufficient, I'm not sure we need a separate message for it @.*** (https://github.com/kevinmhadi) what do you think?).

For one's not included in the above, I'll add them to the metadata.json. For now you can get started on adding the indicator to the coverage tooltip.

— Reply to this email directly, view it on GitHub (https://github.com/mskilab-org/case-report/issues/2#issuecomment-2413956082), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AEKVY764QSX7X7ZR7IYLFTTZ3ULUBAVCNFSM6AAAAABP5F6ON6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJTHE2TMMBYGI). You are receiving this because you were mentioned.Message ID: @.***>

shihabdider commented 1 month ago

I think we can have an overall indicator which is just an icon (and this can also appear on the main dashboard):

And then smaller icons next to each tooltip field guiding the user to the offending values. I think this is easier to implement (over collecting all the offending values into a single tooltip -- of course @xanthopoulakis is the final arbiter of that). It really depends on the user's mental model/workflow regarding a failed sample. Will they immediately want to know exactly which specific attributes failed, thereby seeing at a glance the specific failures, or is it more about categories (coverage was bad, there was contamination, the JaBbA fit is bad -- let's hover on the appropriate tooltip for more information).

Also, the distribution graphs will show some of this as well, since the offending values will likely be outliers (?). And we already have quartile based coloring of the main values (but not of the tooltip values) both of which argues for a single tooltip.

I think for this sprint, @xanthopoulakis can decide which way to go based on whatever is the easier/feasible to implement in the allotted time. But I do think the single tooltip/indicator better fits the mental model of the user.

kevinmhadi commented 1 month ago

Sounds good I like that approach, and yes! @xanthopoulakis (https://github.com/xanthopoulakis) please go with what is most feasible for you to implement during the next few days

On Oct 15, 2024 at 10:55 AM, <Shihab Dider @.***)> wrote:

I think we can have an overall indicator which is just an icon (and this can also appear on the main dashboard):

image.png (view on web) (https://github.com/user-attachments/assets/9a501087-96be-4fb0-a963-0529be73a096)

And then smaller icons next to each tooltip field guiding the user to the offending values. I think this is easier to implement (over collecting all the offending values into a single tooltip -- of course @xanthopoulakis (https://github.com/xanthopoulakis) is the final arbiter of that). It really depends on the user's mental model/workflow regarding a failed sample. Will they immediately want to know exactly which specific attributes failed, thereby seeing at a glance the specific failures, or is it more about categories (coverage was bad, there was contamination, the JaBbA fit is bad -- let's hover on the appropriate tooltip for more information).

Also, the distribution graphs will show some of this as well, since the offending values will likely be outliers (?). And we already have quartile based coloring of the main values (but not of the tooltip values) both of which argues for a single tooltip.

I think for this sprint, @xanthopoulakis (https://github.com/xanthopoulakis) can decide which way to go based on whatever is the easier/feasible to implement in the allotted time. But I do think the single tooltip/indicator better fits the mental model of the user.

— Reply to this email directly, view it on GitHub (https://github.com/mskilab-org/case-report/issues/2#issuecomment-2414167964), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AEKVY7ZH6FS3DQBDX4JZUHDZ3UUFFAVCNFSM6AAAAABP5F6ON6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJUGE3DOOJWGQ). You are receiving this because you were mentioned.Message ID: @.***>

xanthopoulakis commented 1 month ago

@kevinmhadi , @shihabdider , @jrafailov , I am writing a parser for the qc attributes you are offering me, but for the sake of compliance, can we perhaps agree on the following two: 1) The pair attribute should always be a String value, so instead of "pair": 397089 it should be "pair": "397089", 2) The values for the attributes listed in metadata.json should be numeric rather than strings, so no blank characters, % or units. The formatting of the values is the responsiblity of the frontend, the JSON file is merely the data contract. As an example, instead of having "insert_size": "247 bp", we should have "insert_size_in_bp": 247, 3) Lets abstain from using special characters in our attribute names and stick to descriptive language using words connected with underscores instead. Rather than typing "≥_30x": "89.4%", we should have "percentage_value_greater_or_equal_than_30x": 89.4,

For the rest, I am processing the input and will come back to you shortly

xanthopoulakis commented 1 month ago

@Shihab Dider, so these special characters in the metadata.json is kind of messing with my code on the frontend; I realise now that it may be a complete overkills for you to change all your R scripts to convert everything to camel case and replace the special characters like %. I am adding a function in my code that will “normalise” the metadata.json upon arrival, so this will help keep me code protected from funky attribute names. So, don’t change something in your R scripts, I will take care of it in the frontend, though do know this will keep me busy in applying the formatted attribute names everywhere.

shihabdider commented 1 month ago

@xanthopoulakis Just to clarify the standard going forward:

field names should be camelCase
values should be numbers where applicable instead of numerical strings
how do you want us to pass percentages: XX.XX or as a decimal 0.XXXX? (e.g coverage variance is a percentage in the first format, whereas the rest of the percentages are in the second format -- I'm assuming the decimal format for now.)

xanthopoulakis commented 1 month ago

Percentages should be in decimal format between 0 and 1

Charalampos Xanthopoulakis PDEng Data Visualizations Architect

E-mail: @.*** Tel: +30 697 400 7202

On Wed, 16 Oct 2024 at 17:56, Shihab Dider @.***> wrote:

@xanthopoulakis https://github.com/xanthopoulakis Just to clarify the standard going forward:

field names should be camelCase

values should be numbers where applicable instead of numerical strings

how do you want us to pass percentages: XX.XX or as a decimal 0.XXXX? (e.g coverage variance is a percentage in the first format, whereas the rest of the percentages are in the second format)

— Reply to this email directly, view it on GitHub https://github.com/mskilab-org/case-report/issues/2#issuecomment-2417085543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJT2KL7HFGVHNTSYWMWX53Z3Z5DVAVCNFSM6AAAAABP5F6ON6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXGA4DKNJUGM . You are receiving this because you were mentioned.Message ID: @.***>

shihabdider commented 1 month ago

Can we keep the attribute names snake_case? I'm looking over the other json files as well, and they are overwhelmingly snake_case over camelCase, so consistency would actually favor the former. And I don't believe there is any functional difference between to the two as far as javascript JSON/object parsing goes.

xanthopoulakis commented 1 month ago

Sure, snake case is also fine

Charalampos Xanthopoulakis PDEng Data Visualizations Architect

E-mail: @.*** Tel: +30 697 400 7202

On Wed, 16 Oct 2024 at 19:33, Shihab Dider @.***> wrote:

Can we keep the attribute names snake_case? I'm looking over the other json files as well, and they are overwhelmingly snake_case over camelCase, so consistency would actually favor the former. And I don't believe there is any functional difference between to the two as far as javascript JSON/object parsing goes.

— Reply to this email directly, view it on GitHub https://github.com/mskilab-org/case-report/issues/2#issuecomment-2417340665, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJT2KIZOUJAXFKWZRPTY3LZ32INHAVCNFSM6AAAAABP5F6ON6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXGM2DANRWGU . You are receiving this because you were mentioned.Message ID: @.***>

shihabdider commented 1 month ago

OK, sorry for the delay. I have updated the metadata.json for all clinical samples to conform with the standard specified above:

e.g

 {
    "pair": "397089",
    "tumor_type": "PAAD",
    "disease": "Pancreatic Adenocarcinoma",
    "primary_site": "Pancreas",
    "inferred_sex": "male",
    "tumor_median_coverage": 82,
    "normal_median_coverage": 117,
    "coverage_qc": {
      "percent_gc": 0.44,
      "greater_than_or_equal_to_30x": 0.894,
      "greater_than_or_equal_to_50x": 0.782,
      "percent_aligned": 0.997,
      "insert_size": 247,
      "m_reads": 2159,
      "m_reads_mapped": 2153.5,
      "percent_reads_mapped": 0.998,
      "read_pairs_examined": 98311872,
      "read_pair_duplicates": 15639672,
      "read_pair_optical_duplicates": 11623186,
      "percent_duplication": 0.1591,
      "percent_optical_duplication": 0.1182,
      "percent_optical_dups_of_dups": 0.7432,
      "coverage_variance": 0.7852
    },

Note that this is a breaking change as far as the TCGA instance goes. We will need to update all the metadata.jsons for that dataset. I recommend we do this before updating the source of that instance.

@xanthopoulakis I suggest you move your development to NYU if you haven't already. The dataset to use is located here: /gpfs/home/diders01/public_html/case-reports-data. I don't know what the policy is for downloading this data to your local machine (since it is real patient data), but since you're an employee maybe it's OK? Otherwise, you can build your instance on public_html and symlink the data directory to the one above. The instance I am using is located here: /gpfs/home/diders01/public_html/case-reports-clinical-backup you can grab the other necessary files (datafiles.json) etc from the build.

xanthopoulakis commented 1 month ago

Hey, Shihab, I have indeed already moved to the NYU instance, no worries

Charalampos Xanthopoulakis PDEng Data Visualizations Architect

E-mail: @.*** Tel: +30 697 400 7202

On Wed, 16 Oct 2024 at 21:05, Shihab Dider @.***> wrote:

OK, sorry for the delay. I have updated the metadata.json for all clinical samples to conform with the standard specified above:

e.g

{ "pair": "397089", "tumor_type": "PAAD", "disease": "Pancreatic Adenocarcinoma", "primary_site": "Pancreas", "inferred_sex": "male", "tumor_median_coverage": 82, "normal_median_coverage": 117, "coverage_qc": { "percent_gc": 0.44, "greater_than_or_equal_to_30x": 0.894, "greater_than_or_equal_to_50x": 0.782, "percent_aligned": 0.997, "insert_size": 247, "m_reads": 2159, "m_reads_mapped": 2153.5, "percent_reads_mapped": 0.998, "read_pairs_examined": 98311872, "read_pair_duplicates": 15639672, "read_pair_optical_duplicates": 11623186, "percent_duplication": 0.1591, "percent_optical_duplication": 0.1182, "percent_optical_dups_of_dups": 0.7432, "coverage_variance": 0.7852 },

Note that this is a breaking change as far as the TCGA instance goes. We will need to update all the metadata.jsons for that dataset. I recommend we do this before updating the source of that instance.

@xanthopoulakis https://github.com/xanthopoulakis I suggest you move your development to NYU if you haven't already. The dataset to use is located here: /gpfs/home/diders01/public_html/case-reports-data. I don't know what the policy is for downloading this data to your local machine (since it is real patient data), but since you're an employee maybe it's OK? Otherwise, you can build your instance on public_html and symlink the data directory to the one above. The instance I am using is located here: /gpfs/home/diders01/public_html/case-reports-clinical-backup you can grab the other necessary files (datafiles.json) etc from the build.

— Reply to this email directly, view it on GitHub https://github.com/mskilab-org/case-report/issues/2#issuecomment-2417544140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJT2KKSS5CQ2ZJ3UPEVPLLZ32TFZAVCNFSM6AAAAABP5F6ON6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXGU2DIMJUGA . You are receiving this because you were mentioned.Message ID: @.***>

xanthopoulakis commented 1 month ago

First draft version, everyone, perhaps you may wish to give it a spin on your local instance

shihabdider commented 1 month ago

@xanthopoulakis I see that the coverage Tumor/Normal is missing the trailing X at the end. We still need to display this X. See: https://mski-lab.slack.com/archives/C07RKFYJU1J/p1729090201488469

shihabdider commented 1 month ago

@xanthopoulakis Also it seems like your numeric type checker/enforcer is breaking some of the samples: See: https://genome.med.nyu.edu/external/imielinskilab/mskiweb/diders01/case-reports-clinical-backup/build/?report=16578660

And insert size is being interpreted as a percent (it should be an int):

xanthopoulakis commented 1 month ago

let me check, @shihabdider

xanthopoulakis commented 1 month ago

Maybe now, @shihabdider ? Feel free to give it a try:

I am also busy fixing the issue with the missing X character as you noted

shihabdider commented 1 month ago

@xanthopoulakis Some samples are still breaking as above (due to missing data):

https://genome.med.nyu.edu/external/imielinskilab/mskiweb/diders01/case-reports-clinical-backup/build/?report=16323211 https://genome.med.nyu.edu/external/imielinskilab/mskiweb/diders01/case-reports-clinical-backup/build/?report=1160696

Also, it seems like the TCGA instance is now broken as well (as I had suspected it would): http://mskiweb/cxanthopoulakis/case-report/?report=0124

It would be good if the front-end was robust to any missing metadata attributes (with the exception of required attributes like pair)

xanthopoulakis commented 1 month ago

@shihabdider , can you try now? I have pushed a change to capture in a try-catch the situations that an variable might be missing while calculating the quality status

shihabdider commented 1 month ago

@shihabdider , can you try now? I have pushed a change to capture in a try-catch the situations that an variable might be missing while calculating the quality status

Yes, it's working now!

As an aside, should we auto-flag cases that are missing qc values as good (as is done above) or should there by a "missing qc" category as well? @kevinmhadi

kevinmhadi commented 1 month ago

If the data is unavailable I think "missing qc" makes sense as an explicit "NA" categor.

On Thu, Oct 17, 2024 at 4:57 PM Shihab Dider @.***> wrote:

@shihabdider https://github.com/shihabdider , can you try now? I have pushed a change to capture in a try-catch the situations that an variable might be missing while calculating the quality status

Yes, it's working now!

image.png (view on web) https://github.com/user-attachments/assets/1feb3e2e-1651-401f-bdd7-bf7172c84bdd

As an aside, should we auto-flag cases that are missing qc values as good (as is done above) or should there by a "missing qc" category as well? @kevinmhadi https://github.com/kevinmhadi

— Reply to this email directly, view it on GitHub https://github.com/mskilab-org/case-report/issues/2#issuecomment-2420558306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKVY76J4KNMHAXTAX5ZZXTZ4AQEJAVCNFSM6AAAAABP5F6ON6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRQGU2TQMZQGY . You are receiving this because you were mentioned.Message ID: @.***>

xanthopoulakis commented 1 month ago

@shihabdider , @kevinmhadi , I have just pushed a change so that if the metadata.quality_qc attribute is missing, then the Quality Status is marked as Not Available, as indicated below. Also, @shihabdider , I am not showing the "Good Quality" tag anymore, I am only showing the tag for cases that are "Medium" or "Poor" quality status.

shihabdider commented 3 weeks ago

Closing this and opening a new issue for moving the thresholding logic to the backend.