microsoft / genomicsnotebook

Jupyter Notebooks on Azure for Genomics Data Analysis
MIT License
97 stars 48 forks source link

Description on simulated clinical and phenotypic datasets referred in `/sample-notebooks /genomicsML.ipynb` #11

Closed PubuduSaneth closed 8 months ago

PubuduSaneth commented 8 months ago

Thank you very much for sharing an informative set of Jupyter Notebooks.

I've been reviewing the Train Machine Learning Models with Genomics + Clinical Data notebook that uses simulated clinical and phenotypic datasets. However, I couldn't find details on how this datasets are generated.

Could you provide insight into how this data is generated, or direct me to any resources or documentation on this matter?

Thank you very much in advance

erdalcosgun commented 8 months ago

Hi @PubuduSaneth , thanks for checking out our notebooks. This notebook designed to showcase the ML model training with clinical+genomics data. The data that we shared was simulated with a private library. We just provided an expected data format for the ML model training. I recommend checking https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129313/ for a structured data simulation.