waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
69 stars 32 forks source link

Sample Type #130

Closed aedin closed 6 years ago

aedin commented 8 years ago

Hi Can we keep the TCGA barcode info with the pData. For example if I subset on sample type (01 = TP, or 11 = Normal) but then move to the 12 character names, I loose this info. I think its called sample_type in the clinical data. but I can check. Aedin

aedin commented 8 years ago

Using the AWS OV bucket, it seems like the sample type are being treated as duplicates (which they are not). In the OV set, there are 01 (n= 4316), 02 (n=59) , 11 (n=4), which are primary tumor, recurrent tumor and normal tissue respectively,

My use-case is filter the ov dataset to just primary tumors. So I wish to extract the 4316 cases.

I tried to use subsetByColumns but it works on pData not on sampleMap.

So either, the pData is insufficient and should capture sample_type (and the other barcode features) or we store these data in sampleMap and create a method to filter using sampleMap.

lwaldron commented 8 years ago

pData is patient-level, so it's correct that it does not provide sample type info. The assays in the Elist do contain the full barcode, which you can use to filter to primary tumors. Pseudocode:

for (i in 1:length(Elist(MAE))){
   primary.index <- **rule for selecting primary tumors**
   Elist(MAE)[[i]] <- Elist(MAE)[[i]][, primary.index]
}

I think we should stay with how the pData is constructed, but selecting the primary tumors is the first step that most users of the TCGA MAE will want to do, so we should make sure it's well documented and easy to do.

aedin commented 8 years ago

Thanks Levi

On 6/2/16 13:31, Levi Waldron wrote:

|pData| is patient-level, so it's correct that it does not provide sample type info. The assays in the |Elist| do contain the full barcode, which you can use to filter to primary tumors. Pseudocode:

|for (i in 1:length(Elist(MAE))){ primary.index <- rule for selecting primary tumors Elist(MAE)[[i]] <- Elist(MAE)[[i]][, primary.index] } I think we should stay with how the pData is constructed, but selecting the primary tumors is the first step that most users of the TCGA MAE will want to do, so we should make sure it's well documented and easy to do. |

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vjcitn/MultiAssayExperiment/issues/130#issuecomment-223363612, or mute the thread https://github.com/notifications/unsubscribe/AArmUxiFbvz8-AmTDCp6kA4M2wlaPeHYks5qHxNjgaJpZM4IsqFI.

aedin commented 8 years ago

mmmh. should be

for (i in seq_along(Elist(MAE)))

On 6/2/16 13:31, Levi Waldron wrote:

|pData| is patient-level, so it's correct that it does not provide sample type info. The assays in the |Elist| do contain the full barcode, which you can use to filter to primary tumors. Pseudocode:

|for (i in 1:length(Elist(MAE))){ primary.index <- rule for selecting primary tumors Elist(MAE)[[i]] <- Elist(MAE)[[i]][, primary.index] } I think we should stay with how the pData is constructed, but selecting the primary tumors is the first step that most users of the TCGA MAE will want to do, so we should make sure it's well documented and easy to do. |

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vjcitn/MultiAssayExperiment/issues/130#issuecomment-223363612, or mute the thread https://github.com/notifications/unsubscribe/AArmUxiFbvz8-AmTDCp6kA4M2wlaPeHYks5qHxNjgaJpZM4IsqFI.

LiNk-NY commented 6 years ago

Hi Aedin, @aedin

There's now a convenience function in TCGAutils that allows you to filter and separate by sample type, such as:

separateSamples(MAE, "01")
separateSamples(MAE, c("01", "11"))

Note. It is still in review and the name might change.

Regards, Marcel