svm-ai / svm-hackathon

5 stars 0 forks source link

UniProt Card #21

Open SSU02 opened 1 year ago

SSU02 commented 1 year ago

Description of database

It is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It provides cross-references to DNA sequence entries inDDBJ/GenBank/EMBL.Also provide cross-reference to 3-D database PDB. Linked to OMIM database for disease phenotype description. Linked to dbSNP for variant information.

Access (API or download)

UniProt provides several APIs for programmatic access to its data: https://www.uniprot.org/help/programmatic_access Also a FTP server where you can download its entire database or specific subsets of it: ftp://ftp.uniprot.org also check: https://www.uniprot.org/help/downloads The data in the Variant Viewer includes both manually curated and automatically imported data from external sources such as dbSNP and ClinVar. For accessing and downloading variant data, please check out the Proteins API.

Data type

Sequence, Function, disease and variants associated with protein, role of the gene/protein in disease pathogenesis (such as causative, susceptibility, or modifier gene), 3D structure (linking to PDB database), annotation, publications. Feature viewer (showing the domain & sites, PTM, Molecule processing, structural features, topology, PDBe 3D structure coverage, mutagenesis, proteomics, antigenic sequence, variants)

Target metric

(e.g. if variant effect: pathogenicity, binding affinity change, other) There is variant viewer that has consequences such as :(1) Likely disease, (2) Predicted consequence (3) Likely benign and (4) Uncertain. These data are both from manually curated or from large scale sources (imported from ClinVar, gnomAD, COSMIC, 1000genomes, ensembl, exome sequencing project)

Dataset investigation

https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/variants/. Tried for a small dataset - meleagris_gallopavo_variation.txt file and was able to get a proper table listing the informations. (For humans I tried downloading on my PC but since it was a large dataset I was not able to download it) Attached is a screenshot the data.

Image