Closed HAKSOAT closed 5 years ago
Sorry for the late reply. I'm officially on vacation now.
Phenotype was mainly from data file before: https://github.com/yunhailuo/xena-GDC-ETL#transform-phenotype. Some, if not all, of them are available through GDC API: https://docs.gdc.cancer.gov/API/Users_Guide/Appendix_A_Available_Fields/#case-fields
I'm not formally Xena person, but a collaborator. So I don't know the current decision. I would suggest you to contact @maryjgoldman or @jingchunzhu about what phenotype data is needed for Xena.
GSoC 2019 started. Thanks for your interests. In general, we decide to keep two sources since phenotype data from GDC API is not good enough as for now. Feel free to open a new issue if you are still interested in this or you have more ideas.
@yunhailuo On reading the project description here; I noticed the part of the document which talked about wrangling the TCGA phenotype data using only the API path instead of two sources.
I'd appreciate it if I can get more information on this as I intend contributing to Xena for GSoC this year.