Open ihodes opened 8 years ago
It comes from The Broad: https://github.com/hammerlab/biokepi/blob/master/src/lib/reference_genome.ml#L81
It does, but it is an older version, from their help page:
For COSMIC however, it's more problematic as COSMIC doesn't release a VCF version (that I'm aware of, but please correct me if that's not right). Maintaining a converter for an external data source is something that we can't support right now so it doesn't get upgraded that frequently. However, the main purpose of the COSMIC VCF is rather slight.
Sanger manages a VCF now for release 76 with 3,222,429 coding mutations at http://cancer.sanger.ac.uk/cosmic/download, however, you need to register to download it
Where is the source? and what is the format? we could do the transformation ourselves.
Download instructions from http://cancer.sanger.ac.uk/cosmic/download:
(I'm getting a TSV now so I can tell you what the columns are)
SFTP Download: /files/grch38/cosmic/v76/CosmicCompleteExport.tsv.gz
If you haven't done so already, you will need to register before you can download this file.
You will then need to select one of the two download methods listed below.
GUI client The most user friendly method is to use a GUI client such as WinSCP, FileZilla or CyberDuck to connect to our SFTP sever. You will need to download the software, install it and consult the documentation before trying to download the file.
The following credentials will be required to login.
Host name: sftp-cancer.sanger.ac.uk
Protocol: sftp
Port: 22
Username: Your email address
Password: Your password
Once logged in you will need to download the file from this location /files/grch38/cosmic/v76/CosmicCompleteExport.tsv.gz (depending on your web browser settings, clicking the link above should open your GUI client)
SFTP from the Command Line This method is only recommended for those familiar with using the command line.
To login, you will need to open a terminal window and use the following command (and enter your password when prompted). Note that the email address must be quoted. sftp "your_email_address"@sftp-cancer.sanger.ac.uk
To download the file, use the following command
sftp> get /files/grch38/cosmic/v76/CosmicCompleteExport.tsv.gz
For more help using SFTP on the command line, type either the word 'help' or '?' in your terminal.
The mutation TSV files look like this:
Gene name Accession Number Gene CDS length HGNC ID Sample name ID_sample ID_tumour Primary site Site subtype 1 Site subtype 2 Site subtype 3 Primary histology Histology subtype 1 Histology subtype 2 Histology subtype 3 Genome-wide screen Mutation ID Mutation CDS Mutation AA Mutation Description Mutation zygosity LOH GRCh Mutation genome position Mutation strand SNP FATHMM prediction FATHMM score Mutation somatic status Pubmed_PMID ID_STUDY Sample source Tumour origin Age
PTPN11 ENST00000351677 1782 9644 910428 910428 827913 haematopoietic_and_lymphoid_tissue NS NS NS haematopoietic_neoplasm acute_myeloid_leukaemia NS NS n COSM13101 c.? p.R289G Substitution - Missense u Variant of unknown origin 15604238 blood-bone marrow NS
JAK2 ENST00000381652 3399 6192 1104054 1104054 1018290 haematopoietic_and_lymphoid_tissue NS NS NS other splanchnic_vein_thrombosis NS NS n COSM12600 c.1849G>T p.V617F Substitution - Missense u 38 9:5073770-5073770 + n PATHOGENIC .94485 Reported in another cancer sample as somatic 18250227 blood-bone marrow NS
PIK3CA NM_006218.1 3207 8975 1124990 1124990 1038033 large_intestine NS NS NS carcinoma adenocarcinoma NS NS n COSM774 c.3139C>T p.H1047Y Substitution - Missense u 38 3:179234296-179234296 + n PATHOGENIC .94824 Reported in another cancer sample as somatic 18516290 surgery-fixed primary
JAK2 ENST00000381652 3399 6192 1251736 1251736 1163188 haematopoietic_and_lymphoid_tissue NS NS NS haematopoietic_neoplasm myeloproliferative_neoplasm NS NS n COSM12600 c.1849G>T p.V617F Substitution - Missense u 38 9:5073770-5073770 + n PATHOGENIC .94485 Reported in another cancer sample as somatic 19074595 blood-bone marrow NS
JAK2 ENST00000381652 3399 6192 1275775 1275775 1187073 haematopoietic_and_lymphoid_tissue NS NS NS haematopoietic_neoplasm myelodysplastic-myeloproliferative_neoplasm-unclassifiable NS NS n COSM12600 c.1849G>T p.V617F Substitution - Missense hom u 38 9:5073770-5073770 + n PATHOGENIC .94485 Reported in another cancer sample as somatic 17443220 blood-bone marrow NS
IDH1 ENST00000345146 1245 5382 1333161 1333161 1243635 haematopoietic_and_lymphoid_tissue NS NS NS haematopoietic_neoplasm acute_myeloid_leukaemia NS NS n COSM28746 c.395G>A p.R132H Substitution - Missense het u 38 2:208248388-208248388 - n PATHOGENIC .94085 Reported in another cancer sample as somatic 20368538 blood-bone marrow NS
TP53 ENST00000269305 1182 11998 1362583 1362583 1272624 biliary_tract gallbladder NS NS carcinoma adenocarcinoma NS NS n COSM43617 c.? p.? Unknown u Reported in another cancer sample as somatic 16177659 NS NS
JAK2 ENST00000381652 3399 6192 1443351 1443351 1367596 haematopoietic_and_lymphoid_tissue NS NS NS haematopoietic_neoplasm polycythaemia_vera NS NS n COSM12600 c.1849G>T p.V617F Substitution - Missense hom u 38 9:5073770-5073770 + n PATHOGENIC .94485 Reported in another cancer sample as somatic 20422415 blood-bone marrow NS
KRAS ENST00000311936 567 6407 1504861 1504861 1427739 large_intestine NS NS NS carcinoma adenocarcinoma NS NS n COSM520 c.35G>T p.G12V Substitution - Missense u 38 12:25245350-25245350 - n PATHOGENIC .98367 Reported in another cancer sample as somatic 21305640 surgery-fixed NS
So, a total mess.
Should have 2M variants /cc @iskandr