pcingola / SnpSift

Other
35 stars 20 forks source link

dbNSFP not annotating entries #70

Open ji575 opened 2 years ago

ji575 commented 2 years ago

I am trying to annotate variants with dbNSFP using VCFs of individual chromosomes, that have already been annotated with SnpEff and with gnomAD exome frequencies using SnpSift. I am able to do this for one chromosome (chr17) but not the others (for example, chr22). The headers of the 2 files are identical except for filename lines.

I run the following commands:

java -Xmx8g -jar SnpSift.jar dbnsfp -db dbNSFP.txt.gz -f 'REVEL_score,clinvar_clnsig,clinvar_trait,Ensembl_transcriptid' -v chr17_hg38_with_gnomad_freq_and_snpEff.vcf > chr17_hg38_with_gnomad_freq_and_snpEff_dbNSFP_4_3a_REV_clnsig_trait_transcriptid.vcf

java -Xmx8g -jar SnpSift.jar dbnsfp -db dbNSFP.txt.gz -f 'REVEL_score,clinvar_clnsig,clinvar_trait,Ensembl_transcriptid' -v chr22_hg38_with_gnomad_freq_and_snpEff.vcf > chr22_hg38_with_gnomad_freq_and_snpEff_dbNSFP_4_3a_REV_clnsig_trait_transcriptid.vcf

I am using human samples, hg38 genome. I am using SnpSift version 5.0e.

When I run for chr17, the following appears: Total annotated entries : 10371 Total entries : 1219365 Percent : 0.85%

When I run for chr22, the following appears: Total annotated entries : 0 Total entries : 685039 Percent : 0.00%

The same happens for other chromosomes, except 17. The files in question are several GB (I assume too big to attach here).

I hope to obtain a new VCF that contains annotations from dbNSFP for the following fields: REVEL_score, clinvar_clnsig, clinvar_trait, and Ensembl_transcriptid.

Thank you Jackie

ji575 commented 2 years ago

I should also mention that I built the dbNSFP database by downloading files from dbNSFP and using these directions: https://pcingola.github.io/SnpEff/ss_dbnsfp/

I am using dbNSFP version 4.3a (newest version for academia), and running on a virtual machine.