Issue with Accessing "cnv_gistic2" Data-type in UCSCXenaShiny v2.0.0 based on UCSCXenaTools v1.4.8

quiquemedina commented 7 months ago

Dear Support Team,

I am writing to address a concern regarding the recent update of UCSCXenaShiny (UCSCXenaShiny v2.0.0 based on UCSCXenaTools v1.4.8 ). It appears that the data-type "cnv_gistic2" is no longer accessible in R in this latest version. To illustrate, in previous versions of UCSCXenaShiny (e.g., v1.1.10 based on UCSCXenaTools v1.4.8), the following R snippet was operational without issues:

p <- vis_gene_stemness_cor(
  Gene = "TP53",
  cor_method = "spearman",
  data_type = "cnv_gistic2",
  Plot = "TRUE"
)
p

![image](https://github.com/openbiox/UCSCXenaShiny/assets/39810494/5be920e5-acf2-46af-aae0-f2b7bff0f19e)

However, executing the same code in the latest version fails to plot resulting in an error message:

Error in match.arg(data_type) : 
  'arg' should be one of “mRNA”, “transcript”, “protein”, “mutation”, “cnv”, “methylation”, “miRNA”, “fusion”, “promoter”, “APOBEC”

This suggests that "cnv_gistic2" as a data_type option is absent from the updated version.

I kindly request your assistance in investigating this issue. The availability of "cnv_gistic2" data-type is crucial for our analyses, and its absence significantly impacts our research workflow.

Thank you for your attention to this matter. I look forward to your prompt response.

Sincerely,

Enrique

ShixiangWang commented 7 months ago

Thanks, I will take a look

ShixiangWang commented 7 months ago

@lishensuo 关于 .opt_pancan 的设置、调用和更新你还需要优化下。

当前代码还有多处 cnv_gistic2 的记录，但你对应的 query_value 函数以及没有了对应的处理。

ShixiangWang commented 7 months ago

@quiquemedina Set data_type = "cnv" should be fine. We only use the GISTIC2 data in the pan-can analysis (including previous versions). At default (the current version), the thresholded GISTIC2 results are used.

https://github.com/openbiox/UCSCXenaShiny/blob/a7f05ef9e39f656753eef883e2b5bd07f88f6373/R/get_pancan_value.R#L301C13-L315

lishensuo commented 7 months ago

I will inspect and remove "cnv_gistic2" properly. In addition, it might be better to set use_thresholded_data=FALSE as default which will be modified in next PR.

ShixiangWang commented 7 months ago

Got it.

quiquemedina commented 7 months ago

Dear @ShixiangWang and @lishensuo,

I appreciate your clarification regarding the "cnv" feature now encompassing the "cnv_gistic2" data in the latest version of UCSCXenaShiny. However, I would like to emphasize a critical aspect of our research that necessitates distinguishing between these two data types.

In our priviosu analyses, we have observed that exploring gene correlations using "cnv" and "cnv_gistic2" separately in R yields distinct gene signatures, implying unique biological insights. This is because the underlying datasets for these two types are inherently different. Hence, maintaining them as separate options in the new version, as was the case in previous iterations, would be highly beneficial for detailed genomic analysis.

To illustrate, employing an algorithm to discover genes with high correlation cutoffs (e.g., rho > 0.4 or <-0.4) in UCS cancer type for mRNA expression vs stemness attributes, we identified:

With the "cnv" data type, the signature was (CDH10 + GUSBP1 + PMCHL1 + PRDM9 + RN7SL572P).
Using "cnv_gistic2", the signature differed, comprising (AHRR + BRD9 + C5orf55 + CCDC127 + CDH10 + CEP72 + COMTD1 + DUSP13 + EXOC3 + LRRC14B + MIR4456).

Clearly, the gene signatures vary significantly between these data types.

In previous versions, the data_type argument allowed for a range of gene profile types, including "mRNA", "transcript", "protein", "mutation", "cnv" (-2, -1, 0, 1, 2), "cnv_gistic2", "methylation", "miRNA". The ability to select either "cnv" or "cnv_gistic2" explicitly was invaluable for our analyses.

Therefore, I kindly request that you consider reinstating these as separate data_type options in the new version. This change would greatly enhance the tool's utility and accuracy for genomic research.

Furthermore, the radar ploting for "cnv" argument in the new version is faling: Erro: 'arg' should be one of “mRNA”, “transcript”, “protein”, “mutation”, “cnv”, “methylation”, “miRNA”, “fusion”, “promoter”, “APOBEC”

'''

Thank you for considering this request. Your support is crucial to the advancement of our research.

Best regards,

Enrique

quiquemedina commented 7 months ago

Helping to make my case above, let me point to Pros and Cons of Using CNV vs. CNV GISTIC2:

CNV (Copy Number Variation) Pros:

Broad Overview: Provides a general overview of genomic variations, including both amplifications and deletions across the genome.
Versatility: Applicable to a wide range of genetic studies, not just cancer.
Basic Genomic Insights: Useful for initial explorations of genomic alterations and their potential implications.

Cons:

Lack of Specificity: Does not distinguish between variations that are statistically significant or relevant to a particular disease, like cancer.
Limited Clinical Relevance: Basic CNV data may not directly indicate which variations are important for disease progression or treatment.
No Significance Analysis: Does not inherently provide statistical analysis to identify crucial genomic regions.

CNV GISTIC2 (Genomic Identification of Significant Targets in Cancer 2)

Pros:

Cancer-Specific Analysis: Tailored for cancer research, identifying CNVs that are significant in the context of cancer.
Statistical Relevance: Highlights CNVs that occur more frequently than expected by chance, indicating potential key target genes for cancer.
Focused Insights: Offers refined analysis, facilitating the identification of clinically relevant genetic alterations for cancer diagnostics or therapeutics.

Cons:

Cancer-Centric: Primarily useful for cancer studies, may not be as applicable for other types of genetic research.
Complexity: The analysis is more complex and may require more sophisticated understanding and interpretation of genomic data.
Data Intensive: Requires comprehensive datasets and is dependent on the quality of cancer genomic data available.

In summary, while CNV provides a broad perspective on genomic variations, CNV GISTIC2 offers a more targeted and statistically relevant approach, especially valuable in the context of cancer research. The choice between them should be guided by the specific research objectives, the disease of interest, and the level of detail required.

lishensuo commented 7 months ago

Thank you for your question. We have discussed you request. Please wait for one or two days for the optimization.

ShixiangWang commented 7 months ago

@quiquemedina Hi, thanks for your comments and insights :). Previously, I only included the gistic2 copy number data. Based on your suggestions, we would like to reunify the data options.

i.e., for the following datasets:

refer to cnv_gistic
refer to cnv_gistic (thresholded)
ref to cnv

Hi, @lishensuo Please make sure the three options are available to the users (in all exported functions and Shiny UI). Should we consider that option 2 be merged into option 1 in the internal code, and support option 2 with your designed .opt_pancan? Find the easier way to implement the feature. Please discuss with me if you have any problems.

lishensuo commented 7 months ago

OK. I will pull the request ASAP.

lishensuo commented 7 months ago

From my perspective, I think it is better to show one choice for each molecular profile. Within one profile, we can provide further setting like DNA methylation which supports two arrays and personalized limitation.

Based on above 3 types of CNV data, I plan to use the thresholded gistic2 as default choice and provide two further settings

whether use GISTIC2 data (default TRUE);
whether use the thresholded data which is only valid when using GISTIC2 (default FALSE)

quiquemedina commented 7 months ago

@lishensuo and @ShixiangWang,

Absolutely, I wholeheartedly agree with your approach! Focusing on one choice per molecular profile, with the added flexibility of further settings like the dual-array support in DNA methylation, is a strategic and user-friendly way to present these options. Your plan to use the thresholded GISTIC2 as the default for CNV data is particularly insightful. Providing users with the option to choose GISTIC2 data (defaulting to TRUE) and the additional choice to use thresholded data (defaulting to FALSE) when GISTIC2 is enabled adds a valuable layer of customization. This approach not only enhances the utility of your tool but also caters to diverse user needs in a comprehensive and efficient manner. Great development!

ShixiangWang commented 7 months ago

@quiquemedina Thanks. Let's wait for a new PR from shensuo, I will review and merge it to the master branch.

ShixiangWang commented 6 months ago

Thanks all.

openbiox / UCSCXenaShiny

Issue with Accessing "cnv_gistic2" Data-type in UCSCXenaShiny v2.0.0 based on UCSCXenaTools v1.4.8 #286