lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

Modifying string length requirements in vcf2sql #72

Closed gilhornung closed 7 years ago

gilhornung commented 7 years ago

Subject of the issue

Hi Pierre,

Thank you very much for your help in my previous issue.

I am still struggling to generate an sql from the EXAC vcf files. I am getting two error messages related to pre-defined string lengths:

Error 1:

java.lang.RuntimeException: string length(GATGAGGCAGGTTATAGGAAGGATTTGGGGGCTCCTGAGAGAATAGGTTCAGGAAGTAAGGCAGGTTTTAGGGATGGTTTAGGGAGTTCTGTAGAAATGGGGTCAGTGAATGAGGCAGGTTATAGGAA GGATTTAGGGGCTCCTAAGGGAATGGGTTCAGGGAGTAAGACAGGTTTCAGGGATGGTTTAGGGGGTTCTGAAGAAATGGAGTCAATGGATGAGGCAGGTTATAGGAAGGATTTGGGGGCTCCTGAGGGAATAGGTTCAGGAAGTAAGGCAGGTTTTAGGGATGGTTTAGGGAGTTCTACAGAAATGGGGTCAGTGA) greater than 250 L=325 . Update source code forbasesallele

Error 2:

java.lang.RuntimeException: string length(STAG3L5P-PVRIG2P-PILRB) greater than 20 L=22 . Update source code forgeneSymbolvepPrediction

Is there an easier way to change the pre-defined string length than modifying the source? In case there isn't, I think I found the place to change the REF/ALT allele string lengths: private int MAX_ALLELE_LENGTH=250; in the VcfToSql.java file. However, I can't find the pre-defined string length for geneSymbol

Your environment

I am using the latest version of vcf2sql (was just installed)

java version:

java version "1.8.0_91" Java(TM) SE Runtime Environment (build 1.8.0_91-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode) ${JAVA_HOME} -bash: /usr/local/src/SysTools/Java/jdk1.8.0_91: is a directory OS is Red Hat Enterprise Linux Server release 6.4 (Santiago)

Steps to reproduce

Here is the line in the vcf that caused the geneSymbol error: 7 99954185 . C G 114.11 AC_Adj0_Filter AC=1;AC_AFR=0;AC_AMR=0;AC_Adj=0;AC_EAS=0;AC_FIN=0;AC_Het=0;AC_Hom=0;AC_NFE=0;AC_OTH=0;AC_SAS=0;AF=2.993e-05;AN=33414;AN_AFR=88;AN_AMR=40;AN_Adj=3702;AN_EAS=88;AN_FIN=6;AN_NFE=842;AN_OTH=42;AN_SAS=2596;BaseQRankSum=-7.420e-01;ClippingRankSum=-7.420e-01;DP=74689;FS=0.000;GQ_MEAN=14.26;GQ_STDDEV=18.89;Het_AFR=0;Het_AMR=0;Het_EAS=0;Het_FIN=0;Het_NFE=0;Het_OTH=0;Het_SAS=0;Hom_AFR=0;Hom_AMR=0;Hom_EAS=0;Hom_FIN=0;Hom_NFE=0;Hom_OTH=0;Hom_SAS=0;InbreedingCoeff=-0.0798;MQ=60.00;MQ0=0;MQRankSum=0.742;NCC=66370;QD=16.30;ReadPosRankSum=0.742;VQSLOD=-3.110e-01;culprit=MQ;DP_HIST=11198|3582|1524|281|103|13|4|0|1|1|0|0|0|0|0|0|0|0|0|0,0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;GQ_HIST=2933|8026|657|544|2792|560|586|325|92|70|12|2|91|6|5|1|2|0|0|3,0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;DOUBLETON_DIST=.;AC_MALE=.;AC_FEMALE=.;AN_MALE=2664;AN_FEMALE=1038;AC_CONSANGUINEOUS=.;AN_CONSANGUINEOUS=488;Hom_CONSANGUINEOUS=.;CSQ=G|intron_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000444073|protein_coding||2/6|ENST00000444073.1:c.-217-188C>G||||||||1||1|SNV|1|HGNC|18297|||CCDS43622.1|ENSP00000410764|PILRB_HUMAN|D6W5V2_HUMAN&C9JNA4_HUMAN&C9J8P3_HUMAN|UPI000006EEBC||||||||||||||||||||||||||CCC|C,G|intron_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000419749|protein_coding||8/10|ENST00000419749.1:c.-217-188C>G||||||||1||1|SNV|1|HGNC|18297||||ENSP00000404321||C9JNA4_HUMAN&C9J8P3_HUMAN|UPI000198CF11||||||||||||||||||||||||||CCC|C,G|intron_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000610247|protein_coding||13/17|ENST00000610247.1:c.-217-188C>G||||||||1||1|SNV|1|HGNC|18297|YES||CCDS43622.1|ENSP00000477415||D6W5V2_HUMAN&C9JNA4_HUMAN&C9J8P3_HUMAN|UPI000006EEBC||||||||||||||||||||||||||CCC|C,G|intron_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000452089|protein_coding||4/8|ENST00000452089.1:c.-217-188C>G||||||||1||1|SNV|1|HGNC|18297|||CCDS43622.1|ENSP00000391748|PILRB_HUMAN|D6W5V2_HUMAN&C9JNA4_HUMAN&C9J8P3_HUMAN|UPI000006EEBC||||||||||||||||||||||||||CCC|C,G|intron_variant&non_coding_transcript_variant|MODIFIER|STAG3L5P-PVRIG2P-PILRB|ENSG00000272752|Transcript|ENST00000444874|processed_transcript||12/16|ENST00000444874.1:n.1684-188C>G||||||||1||1|SNV|1|HGNC|48898|||||||||||||||||||||||||||||||||CCC|C,G|downstream_gene_variant|MODIFIER|STAG3L5P-PVRIG2P-PILRB|ENSG00000272752|Transcript|ENST00000483329|processed_transcript|||||||||||1|4351|1|SNV|1|HGNC|48898|||||||||||||||||||||||||||||||||CCC|C,G|intron_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000457519|protein_coding||6/8|ENST00000457519.1:c.-217-188C>G||||||||1||1|SNV|1|HGNC|18297||||ENSP00000411261||C9JNA4_HUMAN&C9J8P3_HUMAN|UPI000198CF11||||||||||||||||||||||||||CCC|C,G|intron_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000438028|protein_coding||6/8|ENST00000438028.1:c.-272-188C>G||||||||1||1|SNV|1|HGNC|18297||||ENSP00000409411||C9JTD2_HUMAN|UPI000198CF12||||||||||||||||||||||||||CCC|C,G|intron_variant&non_coding_transcript_variant|MODIFIER|STAG3L5P-PVRIG2P-PILRB|ENSG00000272752|Transcript|ENST00000472646|processed_transcript||3/4|ENST00000472646.1:n.347-188C>G||||||||1||1|SNV|1|HGNC|48898|||||||||||||||||||||||||||||||||CCC|C,G|intron_variant&non_coding_transcript_variant|MODIFIER|STAG3L5P-PVRIG2P-PILRB|ENSG00000272752|Transcript|ENST00000310771|processed_transcript||13/17|ENST00000310771.4:n.2280-188C>G||||||||1||1|SNV|1|HGNC|48898|YES||||||||||||||||||||||||||||||||CCC|C,G|upstream_gene_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000609309|protein_coding|||||||||||1|1453|1|SNV|1|HGNC|18297|||CCDS43622.1|ENSP00000477365||D6W5V2_HUMAN&C9JNA4_HUMAN&C9J8P3_HUMAN|UPI000006EEBC||||||||||||||||||||||||||CCC|C,G|downstream_gene_variant|MODIFIER|PVRIG2P|ENSG00000235333|Transcript|ENST00000435460|transcribed_unprocessed_pseudogene|||||||||||1|2870|1|SNV|1|HGNC|48897|YES||||||||||||||||||||||||||||||||CCC|C,G|intron_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000422808|protein_coding||7/9|ENST00000422808.1:c.-217-188C>G||||||||1||1|SNV|1|HGNC|18297||||ENSP00000389856||C9JNA4_HUMAN&C9J8P3_HUMAN|UPI000198CF11||||||||||||||||||||||||||CCC|C,G|intron_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000431140|protein_coding||2/4|ENST00000431140.1:c.-217-188C>G||||||||1||1|SNV|1|HGNC|18297||||ENSP00000416342||C9JGQ4_HUMAN|UPI000198CF14||||||||||||||||||||||||||CCC|C,G|downstream_gene_variant|MODIFIER|STAG3L5P|ENSG00000242294|Transcript|ENST00000473757|retained_intron|||||||||||1|4662|1|SNV|1|HGNC|48896|YES||||||||||||||||||||||||||||||||CCC|C,G|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000493091|retained_intron|5/5||ENST00000493091.1:n.2892C>G||2892||||||1||1|SNV|1|HGNC|18297|||||||||||||||||||||||||||||||||CCC|C,G|5_prime_UTR_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000448382|protein_coding|5/9||ENST00000448382.1:c.-39C>G||1079||||||1||1|SNV|1|HGNC|18297||||ENSP00000415775|PILRB_HUMAN|C9JPU0_HUMAN&C9JGQ4_HUMAN|UPI000006E5EB||||||||||||||||||||||||||CCC|C,G|5_prime_UTR_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000455145|protein_coding|5/6||ENST00000455145.1:c.-39C>G||541||||||1||1|SNV|1|HGNC|18297||||ENSP00000402073||C9JPU0_HUMAN|UPI000198CF13||||||||||||||||||||||||||CCC|C,G|upstream_gene_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000608825|protein_coding|||||||||||1|188|1|SNV|1|HGNC|18297||||ENSP00000476607||C9JNA4_HUMAN&C9J8P3_HUMAN|UPI0003B928BB||||||||||||||||||||||||||CCC|C,G|downstream_gene_variant|MODIFIER|STAG3L5P-PVRIG2P-PILRB|ENSG00000272752|Transcript|ENST00000470714|processed_transcript|||||||||||1|4073|1|SNV|1|HGNC|48898|||||||||||||||||||||||||||||||||CCC|C,G|intron_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000443526|protein_coding||4/6|ENST00000443526.1:c.-272-188C>G||||||||1||1|SNV|1|HGNC|18297||||ENSP00000403757||C9JNA4_HUMAN&C9J8P3_HUMAN|UPI000198CF11||||||||||||||||||||||||||CCC|C,G|upstream_gene_variant|MODIFIER|PILRB|ENSG00000121716|Transcript|ENST00000438231|protein_coding|||||||||||1|818|1|SNV|1|HGNC|18297||||ENSP00000408425||C9JNA4_HUMAN|UPI000198CF15||||||||||||||||||||||||||CCC|C;AC_POPMAX=NA;AN_POPMAX=NA;POPMAX=NA;K1_RUN=C:1;K2_RUN=CC:0;K3_RUN=CCA:0;ESP_AF_POPMAX=0;ESP_AF_GLOBAL=0;ESP_AC=0;KG_AF_POPMAX=0;KG_AF_GLOBAL=0;KG_AC=0

gilhornung commented 7 years ago

I was able to find how to modify it: Allele length in: private int MAX_ALLELE_LENGTH=250 Gene name length: new ColumnBuilder().name("geneSymbol").nilleable().length(20).make()

Both in the VcfToSql.java file