Open merdivane opened 1 year ago
I'm looking into Clinvar at the moment. What's a 'card' ? Is it a new issue with the info you describe above?
For COSMIC database I tried downloading a sample dataset given in the website: https://cancer.sanger.ac.uk/cosmic/about (for GRCh37). Out of the many files, CancerMutationCensus_AllData_v98_GRCh37.tsv file has information about the pathohgenicity and type of mutations which I think is what we are looking for.
Exploring Clinvar here https://colab.research.google.com/drive/1jrOcgn07_ZMkNgqHfzEMQ8OLCFzp9kU1?usp=sharing while AWS doesnt work
I was thinking to create a CLINVAR card for example where you create a new issue following the structure
Description of database Access (API or download) File format Information in database Paper (where the databased is used and steps to curate it) Snippet of data Tasks (which tasks can benefit)
Let's put databases here. In order to avoid confusion, the dataset refers to curated data from the database which we can use in ml and AI models. Database refers to Clinvar where they have raw data online but we need to work to get data and convert it to the dataset.
Database Card
Description of database Access (API or download) File format Information in database Paper (where the databased is used and steps to curate it) Snippet of data Tasks (which tasks can benefit)