TCGA Phenotype Data from API Path

HAKSOAT commented 5 years ago

@yunhailuo On reading the project description here; I noticed the part of the document which talked about wrangling the TCGA phenotype data using only the API path instead of two sources.

I'd appreciate it if I can get more information on this as I intend contributing to Xena for GSoC this year.

yunhailuo commented 5 years ago

Sorry for the late reply. I'm officially on vacation now.

Phenotype was mainly from data file before: https://github.com/yunhailuo/xena-GDC-ETL#transform-phenotype. Some, if not all, of them are available through GDC API: https://docs.gdc.cancer.gov/API/Users_Guide/Appendix_A_Available_Fields/#case-fields

I'm not formally Xena person, but a collaborator. So I don't know the current decision. I would suggest you to contact @maryjgoldman or @jingchunzhu about what phenotype data is needed for Xena.

yunhailuo commented 5 years ago

GSoC 2019 started. Thanks for your interests. In general, we decide to keep two sources since phenotype data from GDC API is not good enough as for now. Feel free to open a new issue if you are still interested in this or you have more ideas.

ucscXena / xena-GDC-ETL

TCGA Phenotype Data from API Path #19