waldronlab / cBioPortalData

Integrate the cancer genomics portal, cBioPortal, using MultiAssayExperiment
https://waldronlab.io/cBioPortalData/
30 stars 12 forks source link

Unable to import: ... missing value where TRUE/FALSE needed #60

Closed BiotechPedro closed 2 years ago

BiotechPedro commented 2 years ago

Hello everyone!

I was trying to import some data as I did in the past, but got a warning on creating the MultiAssayExperiment for some experiments data. Concretely, the mRNA-seqv2-RSEM for TCGA PanCancer Atlas data. It is due to the rownames including NAs - (https://github.com/waldronlab/cBioPortalData/issues/57) and (https://github.com/Bioconductor/SummarizedExperiment/issues/64). Also, when setting as argument names.field = c("Entrez_Gene_Id", "Gene") for the function cBioDataPack(), the warning is solved, but others experiments (CNAs) now get it and are not imported. I do not understand why it is now appearing if some months ago was not. Would it be solved soon?

Thank you (for the time and the package)!!

Pedro

LiNk-NY commented 2 years ago

Hi Pedro, @BiotechPedro Can you help me and provide a minimally reproducible example? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Thank you!

-Marcel

BiotechPedro commented 2 years ago

For sure! I run this code laml_TCGA_PanCancer <- cBioPortalData::cBioDataPack("laml_tcga_pan_can_atlas_2018", ask = FALSE) and get the following warning messages:

1: Unable to import: mrna_seq_v2_rsem_zscores_ref_all_samples Reason: missing value where TRUE/FALSE needed 2: Unable to import: mrna_seq_v2_rsem Reason: missing value where TRUE/FALSE needed

However, when I run laml_TCGA_PanCancer <- cBioPortalData::cBioDataPack("laml_tcga_pan_can_atlas_2018", names.field = c("Entrez_Gene_Id", "Gene"), ask = FALSE) the warning messages changes:

1: Unable to import: cna Reason: missing value where TRUE/FALSE needed 2: Unable to import: log2_cna Reason: missing value where TRUE/FALSE needed

In the first case I do not have mrna_seq_v2_rsem, for example, in the ExperimentList slot while in the second case yes. However, that did not happen to me some months ago for the first case and now happens for more than a data set (only used TCGA PanCancer Atlas ones). What could have changed? Do you think it is a bug that can be solved?

Maybe the above-mentioned issues can help!

Pedro

LiNk-NY commented 2 years ago

Hi Pedro, @BiotechPedro Thanks for reporting. I suspect that this is based on changes to the SummarizedExperiment constructor but I can't confirm this. I've made an update to remedy this using the suggestion to use "" instead of missing.

https://github.com/mksamur/RTCGAToolbox/commit/6e1cc6ae8566df8c8e84a14ede67e8f45c29d16f

You should see this change in RTCGAToolbox version 2.27.1. It will take 24 - 48 hours to be reflected in the devel version of Bioconductor.

Note. It's best to report missing values at @cbioportal/cbioportal which is where the data is curated.

Best, Marcel