Closed aedin closed 6 years ago
Using the AWS OV bucket, it seems like the sample type are being treated as duplicates (which they are not). In the OV set, there are 01 (n= 4316), 02 (n=59) , 11 (n=4), which are primary tumor, recurrent tumor and normal tissue respectively,
My use-case is filter the ov dataset to just primary tumors. So I wish to extract the 4316 cases.
I tried to use subsetByColumns but it works on pData not on sampleMap.
So either, the pData is insufficient and should capture sample_type (and the other barcode features) or we store these data in sampleMap and create a method to filter using sampleMap.
pData
is patient-level, so it's correct that it does not provide sample type info. The assays in the Elist
do contain the full barcode, which you can use to filter to primary tumors. Pseudocode:
for (i in 1:length(Elist(MAE))){
primary.index <- **rule for selecting primary tumors**
Elist(MAE)[[i]] <- Elist(MAE)[[i]][, primary.index]
}
I think we should stay with how the pData is constructed, but selecting the primary tumors is the first step that most users of the TCGA MAE will want to do, so we should make sure it's well documented and easy to do.
Thanks Levi
On 6/2/16 13:31, Levi Waldron wrote:
|pData| is patient-level, so it's correct that it does not provide sample type info. The assays in the |Elist| do contain the full barcode, which you can use to filter to primary tumors. Pseudocode:
|for (i in 1:length(Elist(MAE))){ primary.index <- rule for selecting primary tumors Elist(MAE)[[i]] <- Elist(MAE)[[i]][, primary.index] } I think we should stay with how the pData is constructed, but selecting the primary tumors is the first step that most users of the TCGA MAE will want to do, so we should make sure it's well documented and easy to do. |
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vjcitn/MultiAssayExperiment/issues/130#issuecomment-223363612, or mute the thread https://github.com/notifications/unsubscribe/AArmUxiFbvz8-AmTDCp6kA4M2wlaPeHYks5qHxNjgaJpZM4IsqFI.
mmmh. should be
for (i in seq_along(Elist(MAE)))
On 6/2/16 13:31, Levi Waldron wrote:
|pData| is patient-level, so it's correct that it does not provide sample type info. The assays in the |Elist| do contain the full barcode, which you can use to filter to primary tumors. Pseudocode:
|for (i in 1:length(Elist(MAE))){ primary.index <- rule for selecting primary tumors Elist(MAE)[[i]] <- Elist(MAE)[[i]][, primary.index] } I think we should stay with how the pData is constructed, but selecting the primary tumors is the first step that most users of the TCGA MAE will want to do, so we should make sure it's well documented and easy to do. |
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vjcitn/MultiAssayExperiment/issues/130#issuecomment-223363612, or mute the thread https://github.com/notifications/unsubscribe/AArmUxiFbvz8-AmTDCp6kA4M2wlaPeHYks5qHxNjgaJpZM4IsqFI.
Hi Can we keep the TCGA barcode info with the pData. For example if I subset on sample type (01 = TP, or 11 = Normal) but then move to the 12 character names, I loose this info. I think its called sample_type in the clinical data. but I can check. Aedin