svm-ai / svm-hackathon

5 stars 0 forks source link

Database Card #15

Open merdivane opened 1 year ago

merdivane commented 1 year ago

Let's put databases here. In order to avoid confusion, the dataset refers to curated data from the database which we can use in ml and AI models. Database refers to Clinvar where they have raw data online but we need to work to get data and convert it to the dataset.

Database Card

Description of database Access (API or download) File format Information in database Paper (where the databased is used and steps to curate it) Snippet of data Tasks (which tasks can benefit)

andreiarog commented 1 year ago

I'm looking into Clinvar at the moment. What's a 'card' ? Is it a new issue with the info you describe above?

SSU02 commented 1 year ago

For COSMIC database I tried downloading a sample dataset given in the website: https://cancer.sanger.ac.uk/cosmic/about (for GRCh37). Out of the many files, CancerMutationCensus_AllData_v98_GRCh37.tsv file has information about the pathohgenicity and type of mutations which I think is what we are looking for.

andreiarog commented 1 year ago

Exploring Clinvar here https://colab.research.google.com/drive/1jrOcgn07_ZMkNgqHfzEMQ8OLCFzp9kU1?usp=sharing while AWS doesnt work

merdivane commented 1 year ago

I was thinking to create a CLINVAR card for example where you create a new issue following the structure

Description of database Access (API or download) File format Information in database Paper (where the databased is used and steps to curate it) Snippet of data Tasks (which tasks can benefit)