waldronlab / TCGAutils

Toolbox package for organizing and working with TCGA data
https://bioconductor.org/packages/TCGAutils
22 stars 6 forks source link

UUIDtoBarcode #29

Closed SerifatAdebola closed 3 years ago

SerifatAdebola commented 3 years ago

When I run UUIDtoBarcode ISSUE 1 with file_id I end up with twice the dataframe size i.e two barcodes per file id ISSUE 2 with case_id the column for case_id return

LiNk-NY commented 3 years ago

Hi @SerifatAdebola

Can you provide a minimally reproducible example? Can you explain what you mean with issue 2? Thanks.

Best regards, Marcel

SerifatAdebola commented 3 years ago

Hi, Attached are the text files that have the necessary information. Sorry I had a typo with Issue 2: when i run UUIDtoBarcode with case_id the column for case_id returns . fileUUIDresult.txt caseID.txt fileID.txt barcodesUUIDresult.txt

SerifatAdebola commented 3 years ago

fileUUIDresult.txt - UUIDtoBarcode with file ID result caseID.txt - Case ID fileID.txt -File ID barcodesUUIDresult.txt - UUIDtoBarcode with case ID result

LiNk-NY commented 3 years ago

Please provide the R code with a minimally reproducible example.

https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Best, Marcel

SerifatAdebola commented 3 years ago

Hi Marcel, here is a minimally reproducible sample Code : R version 4.0.2 (2020-06-22)

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("TCGAutils")

library(TCGAutils) file= read.table("fileidtest.txt", sep=“\t") data2= UUIDtoBarcode(file,from_type = "file_id") fileidtest.txt

lwaldron commented 3 years ago

Here's a more minimal reproducible example:

library(TCGAutils)
UUIDtoBarcode("01ef8a08-1de5-4ceb-be51-979418465f1a",from_type = "file_id")
#>                                file_id associated_entities.entity_submitter_id
#> 1 01ef8a08-1de5-4ceb-be51-979418465f1a            TCGA-EL-A4JX-11A-11D-A259-01
#> 2 01ef8a08-1de5-4ceb-be51-979418465f1a            TCGA-EL-A4JX-01A-12D-A256-01

Created on 2021-07-26 by the reprex package (v2.0.0)

In this example it looks like the UUID is associated with a patient (TCGA-EL-A4JX) for which there are two types of specimens (01A and 11A). See https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/.

UUIDtoBarcode just calls the GDC API (https://docs.gdc.cancer.gov/API/Users_Guide/Search_and_Retrieval/), so the GDC help would be better able to answer questions about how TCGA assigned UUIDs to aliquots, specimens, patients, etc (it seems complicated and I don't totally understand it myself!)

LiNk-NY commented 3 years ago

Thanks Levi! @lwaldron I've also fixed a bug where the file_id s were not in the right order. 66f15d5

SerifatAdebola commented 3 years ago

Thank you @LiNk-NY @lwaldron