zhanxw / rvtests

Rare variant test software for next generation sequencing data
134 stars 42 forks source link

rare variant analysis #80

Open sbaheti opened 5 years ago

sbaheti commented 5 years ago

hi

I am having difficulties running the tool and get the output. The analysis skips all the variants for a gene and doesn't give out any results. Do you know what is wrong with the input ?

Thanks !

log file: [INFO] Program version: 20190205 [INFO] Git Version: c86e589efef15382603300dc7f4c3394c82d69b8 [INFO] Parameters BEGIN

ParameterList created by m078940 on mforgehn2 at Thu Mar 7 13:05:43 2019

--inVcf "variants.vcf.gz" --out "out" --pheno "../../../covariate.file.tsv" --burden "cmc" --geneFile "Homo_sapiens.GRCh38.78.mod.GGPS.refFlat" --gene "WASH7P" [INFO] Parameters END [INFO] Analysis started at: Thu Mar 7 13:05:43 2019 [INFO] Loaded [ 26 ] samples from genotype files [INFO] Loaded [ 26 ] sample phenotypes [INFO] Loaded 0 male, 0 female and 26 sex-unknown samples from ../../../covariate.file.tsv [INFO] Loaded 18 cases, 8 controls, and 0 missing phenotypes [WARN] -- Enabling binary phenotype mode -- [INFO] Analysis begins with [ 26 ] samples... [INFO] Loaded [ 1 ] genes. [INFO] Impute missing genotype to mean (by default) [INFO] Analysis started [INFO] Gene WASH7P has 0 variants, skipping [INFO] Analyzed [ 0 ] variants from [ 1 ] genes/regions [INFO] Analysis ends at: Thu Mar 7 13:05:44 2019 [INFO] Analysis took 1 seconds

zhanxw commented 5 years ago

Can you verify that the variants of the gene exist in the vcf file?

Sent from my iPhone

On Mar 7, 2019, at 1:30 PM, sbaheti notifications@github.com wrote:

hi

I am having difficulties running the tool and get the output. The analysis skips all the variants for a gene and doesn't give out any results. Do you know what is wrong with the input ?

Thanks !

log file: [INFO] Program version: 20190205 [INFO] Git Version: c86e589 [INFO] Parameters BEGIN

ParameterList created by m078940 on mforgehn2.mayo.edu at Thu Mar 7 13:05:43 2019

--inVcf "/research/bsi/projects/PI/Rakela_Jorge_jxr14/secondary/s210022.ALF/variants/variants.vcf.gz" --out "/home/mayo/m078940/outRakela" --pheno "../../../covariate.file.tsv" --burden "cmc" --geneFile "/research/bsi/data/refdata/ensembl/human/gene/gtf/release-78/processed/2015_02_25/Homo_sapiens.GRCh38.78.mod.GGPS.refFlat" --gene "WASH7P" [INFO] Parameters END [INFO] Analysis started at: Thu Mar 7 13:05:43 2019 [INFO] Loaded [ 26 ] samples from genotype files [INFO] Loaded [ 26 ] sample phenotypes [INFO] Loaded 0 male, 0 female and 26 sex-unknown samples from ../../../covariate.file.tsv [INFO] Loaded 18 cases, 8 controls, and 0 missing phenotypes [WARN] -- Enabling binary phenotype mode -- [INFO] Analysis begins with [ 26 ] samples... [INFO] Loaded [ 1 ] genes. [INFO] Impute missing genotype to mean (by default) [INFO] Analysis started [INFO] Gene WASH7P has 0 variants, skipping [INFO] Analyzed [ 0 ] variants from [ 1 ] genes/regions [INFO] Analysis ends at: Thu Mar 7 13:05:44 2019 [INFO] Analysis took 1 seconds

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

sbaheti commented 5 years ago

YES the variants exist, i also tried it without specifying any gene and in the log file it records all the genes are skipped as there are no variants. Not sure about the issue.

sbaheti commented 5 years ago

do you have any other suggestion i can try to make it work ?

sbaheti commented 5 years ago

i tried it again with a different statistics but still the same error to skip all the variants and this time i didn't specify any genes. Am i missing some parameter ?

[INFO] Program version: 20190205 [INFO] Git Version: c86e589efef15382603300dc7f4c3394c82d69b8 [INFO] Parameters BEGIN

ParameterList Mon Mar 25 09:45:20 2019

--inVcf "variants.vcf.gz" --out "out9" --pheno "covariate.file.tsv" --burden "zeggini" --geneFile "Homo_sapiens.GRCh38.78.mod.GGPS.refFlat" [INFO] Parameters END [INFO] Analysis started at: Mon Mar 25 09:45:20 2019 [INFO] Loaded [ 26 ] samples from genotype files [INFO] Loaded [ 26 ] sample phenotypes [INFO] Loaded 0 male, 0 female and 26 sex-unknown samples from covariate.file.tsv [INFO] Loaded 18 cases, 8 controls, and 0 missing phenotypes [WARN] -- Enabling binary phenotype mode -- [INFO] Analysis begins with [ 26 ] samples... [INFO] Loaded [ 58157 ] genes. [INFO] Impute missing genotype to mean (by default) [INFO] Analysis started [INFO] Gene DDX11L1 has 0 variants, skipping [INFO] Gene WASH7P has 0 variants, skipping ......

matmu commented 5 years ago

Hello @zhanxw, I am getting exactly the same error as @sbaheti. Here is an example to reproduce the error:

refFlat.txt

SLMAP   NM_007159   chr3    +   57757255    57930016    57757651    57927388    21  57757255,57831382,57841298,57847196,57849753,57857732,57858087,57860698,57861948,57864547,57871635,57890040,57896510,57896872,57907883,57909075,57912380,57913157,57916905,57922888,57927295,   57757849,57831530,57841371,57847233,57849816,57857828,57858159,57860839,57862086,57864716,57871698,57890100,57896591,57896932,57908006,57909150,57912701,57913275,57917077,57923023,57930016,
SLMAP   NM_001304422    chr3    +   57889913    57930016    57896880    57927388    10  57889913,57896510,57896872,57907883,57909075,57912380,57913157,57916905,57922888,57927295,  57890100,57896591,57896932,57908006,57909150,57912701,57913275,57917077,57923023,57930016,
SLMAP   NM_001304421    chr3    +   57757255    57930016    57757651    57927388    20  57757255,57831382,57841298,57847196,57849753,57857732,57858087,57860698,57861948,57864547,57890040,57896510,57896872,57907883,57909075,57912380,57913157,57916905,57922888,57927295,    57757849,57831530,57841371,57847233,57849816,57857828,57858159,57860839,57862086,57864716,57890100,57896591,57896932,57908006,57909150,57912701,57913275,57917077,57923023,57930016,
SLMAP   NM_001304423    chr3    +   57889913    57930016    57896880    57927388    8   57889913,57896510,57896872,57912380,57913157,57916905,57922888,57927295,    57890100,57896591,57896932,57912701,57913275,57917077,57923023,57930016,
SLMAP   NM_001304420    chr3    +   57757255    57930016    57757651    57927388    22  57757255,57831382,57841298,57847196,57849753,57857732,57858087,57860698,57861948,57864547,57864806,57871635,57890040,57896510,57896872,57907883,57909075,57912380,57913157,57916905,57922888,57927295,  57757849,57831530,57841371,57847233,57849816,57857828,57858159,57860839,57862086,57864716,57864857,57871698,57890100,57896591,57896932,57908006,57909150,57912701,57913275,57917077,57923023,57930016,
SLMAP   NM_001311178    chr3    +   57889913    57930016    57896880    57925928    11  57889913,57896510,57896872,57907883,57909075,57912380,57913157,57916905,57922888,57925844,57927295, 57890100,57896591,57896932,57908006,57909150,57912701,57913275,57917077,57923023,57925934,57930016,
SLMAP   NM_001311179    chr3    +   57889913    57918482    57896880    57917155    7   57889913,57896510,57896872,57909075,57912380,57913157,57916905, 57890100,57896591,57896932,57909150,57912701,57913275,57918482,

test.vcf.gz

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3  S4  S5  S6  S7  S8  S9  S10 S11 S12
chr3    57757839    .   A   G   367.99  PASS    AC=1;AF=0.042;AN=24;BaseQRankSum=-2.812;ClippingRankSum=0;DP=412;ExcessHet=3.0103;FS=4.01;InbreedingCoeff=-0.0435;MLEAC=1;MLEAF=0.042;MQ=60;MQRankSum=0;QD=17.52;ReadPosRankSum=-1.469  GT:AD:DP:GQ:PL  0/1:7,14:21:99:403,0,187    0/0:41,1:42:99:0,114,1465   0/0:38,0:38:99:0,102,1530   0/0:33,0:33:90:0,90,1350    0/0:33,0:33:81:0,81,1215    0/0:30,0:30:78:0,78,1170    0/0:42,0:42:99:0,120,1800   0/0:43,0:43:99:0,117,1755   0/0:29,0:29:81:0,81,1215    0/0:30,0:30:87:0,87,1305    0/0:37,0:37:99:0,105,1575   0/0:34,0:34:87:0,87,1305
chr3    57896989    rs762608342 A   G   304.99  PASS    AC=1;AF=0.042;AN=24;BaseQRankSum=-3.955;ClippingRankSum=0;DB;DP=433;ExcessHet=3.0103;FS=0;InbreedingCoeff=-0.0435;MLEAC=1;MLEAF=0.042;MQ=60;MQRankSum=0;QD=10.52;ReadPosRankSum=0.321   GT:AD:DP:GQ:PL  0/0:36,0:36:99:0,105,1575   0/0:42,0:42:99:0,117,1755   0/0:32,0:32:81:0,81,1215    0/0:43,0:43:99:0,120,1800   0/0:37,0:37:96:0,96,1440    0/0:37,0:37:99:0,108,1620   0/0:45,0:45:99:0,120,1800   0/0:24,0:24:66:0,66,990 0/1:19,14:33:99:332,0,594   0/0:52,0:52:99:0,120,1800   0/0:34,1:35:89:0,89,1237    0/0:41,0:41:99:0,111,1665

test.ped

fid     iid     father_id       mother_id       sex     pheno
A       S1      NA      NA  1       2
A       S2      NA      NA      2       1
A       S3      NA      NA      1       1
A       S4      NA      NA      1       1
B       S5      NA      NA      1       1
B       S6      NA      NA      1       2
B       S7      NA    NA      2       1
C       S8      NA      NA   1       2
C       S9      NA      NA      2       1
C       S10     NA      NA      2       1
C       S11     NA      NA   2       2
C       S12     NA      NA      2       2

Command

/path/to/rvtests_v2.1.0/executable/rvtest --inVcf test.vcf.gz \
        --pheno test.ped --pheno-name pheno \
        --out results --geneFile refFlat.txt --burden cmc,cmcWald,zeggini,zegginiWald --vt price --kernel skat,kbac --gene SLMAP

RVTEST log

Retrieve remote version failed, use '--noweb' to skip.
[INFO]  Program version: 20190205
[INFO]  Analysis started at: Thu Apr 11 15:44:48 2019
[INFO]  Loaded [ 12 ] samples from genotype files
[INFO]  Loaded [ 12 ] sample phenotypes
[INFO]  Loaded 6 male, 6 female and 0 sex-unknown samples from test.ped
[INFO]  Loaded 5 cases, 7 controls, and 0 missing phenotypes
[WARN]  -- Enabling binary phenotype mode -- 
[INFO]  Analysis begins with [ 12 ] samples...
[INFO]  Price's VT test significance will be evaluated using 10000 permutations at alpha = 0.05
[INFO]  SKAT test significance will be evaluated using 10000 permutations at alpha = 0.05 weight = Beta[beta1 = 1.00, beta2 = 25.00]
[INFO]  KBAC test significance will be evaluated using 10000 permutations at alpha = 0.05
[INFO]  Loaded [ 1 ] genes.
[INFO]  Impute missing genotype to mean (by default)
[INFO]  Analysis started
[INFO]  Gene SLMAP has 0 variants, skipping
[INFO]  Analyzed [ 0 ] variants from [ 1 ] genes/regions
[INFO]  Analysis ends at: Thu Apr 11 15:44:48 2019
[INFO]  Analysis took 0 seconds
RVTESTS finished successfully
erampersaud commented 5 years ago

Hello @zhanxw -- are there any updates on this error? I have the same issue and wasn't sure if it was ever resolved. thank you.

ayub1985 commented 4 years ago

I am facing exactly the same issue. Does anyone have an advice?

ayub1985 commented 4 years ago

https://github.com/zhanxw/rvtests/issues/80#issuecomment-482121514 https://github.com/matmu Did you manage to resolve this?

matmu commented 4 years ago

@ayub1985 found out that the program cannot handle vcf files properly if there is "chr" before the chromosome number for the variants.

anna-555 commented 3 years ago

Hi there,

I am having the same problem and I already deleted the "chr" part. Do you have any other sugestion?

Thank you in advance

lindenb commented 2 years ago

Hi, I'm having the same problem.

Cross-posted on biostars: https://www.biostars.org/p/9504551/

edit: removing the chr prefix fixed the problem.

jfertaj commented 2 years ago

I have removed chr from refFlat file and from vcf but I am still getting 0 variants for all genes

Zekexie commented 1 year ago

You should remove the 'chr' from vcf files not the refFlat file.

Zekexie commented 1 year ago

Hi there,

I am having the same problem and I already deleted the "chr" part. Do you have any other sugestion?

Thank you in advance

You should remove the 'chr' from vcf files not the refFlat file.