zhanxw / rvtests

Rare variant test software for next generation sequencing data
126 stars 41 forks source link

Segmentation fault while running --meta cov #151

Open samreenzafer opened 5 months ago

samreenzafer commented 5 months ago

Hi I have run rvtest several times for single variant and grouped tests, and am now trying to use the --meta score,cov for 3 populations of my cohort separately (Afr, hisp and EURs) - so that I can meta analyze using RAREMETAL.

All my VCFs are divided by Chromosomes and in the correct format expected by rvtest (and I have successfully run the gene based rvtests already on them - example (--inVcf "vcfs/cohort.forRVtest.19.reID.vcf.gz" --out "HISP/lofHC_REVELdamaging_clinvar/maf.0.01//rv.HISP.maf.0.01.kernel.skato.19" --pheno "pheno.HISP.txt" --peopleIncludeFile "HISP/keep.samples.HISP" --peopleExcludeFile "vcfs/exclude_samples.txt" --siteFile "lofHC_REVELdamaging_clinvar.txt.snps" --kernel "skato" --geneFile "refGene_hg38.txt" --freqUpper 0.01 --noweb - which analyzed [INFO] Analyzed [ 2009 ] variants from [ 26969] genes/regions ]

In the similar manner I am now running --meta score,cov subsetting the samples and variants lists in the same fashion, but I get a segmentation fault, and I think this is only happening due the covariance matrix building, Not the Score file. I then ran --meta score and --meta cov separately and the --meta score runs successfully to completion, but the --meta cov does NOT and gives a segmentation fault.

Could this be due to a small number of variants remaining after the variant exclusion criteria (namely, --freqUpper 0.01 --siteFile "lofHC_REVELdamaging_clinvar.txt.snps"), thereby causing a programmatical issue, like NAs or infinite in the matrix? Should I instead try to run the --meta score,cov only on variants filtered by --freqUpper and then when I meta analyze using RAREMETAL, I could try to subset the list of variants to use only the damaging variants I'm interested in ?

Also, there were a handful of chromosomes that did run successfully. for HISP - chr 11, 14,16,22, & 8 with each analyzing 2146, 954, 1650 ,771 and 1227 variants. for AFR - chr 13, 15 & 22 with each analyzing 632, 1241, and 745 variants. for EUR - all chrs ran successfully, with Chr 21 having least # variants analysed = 125, and Chr1 having the largest = 1568. Looking at this I'm not sure if it's the small number of variants that could be causing a problem.

Here is my successful --meta score command , which shows 324 variants were analyzed. # ParameterList created by zafers02 on li03c03.chimera.hpc.mssm.edu at Fri Feb 16 12:30:44 2024 --inVcf "/vcfs/cohort.forRVtest.21.reID.vcf.gz" --out "rvtest.HISP.nogene.CategB.21" --pheno "pheno.HISP.txt" --peopleIncludeFile "HISP/keep.samples.HISP" --peopleExcludeFile "vcfs/exclude_samples.txt" --siteFile "lofHC_REVELdamaging_clinvar.txt.snps" --meta "score" --freqUpper 0.01 --noweb [INFO] Parameters END [INFO] Analysis started at: Fri Feb 16 12:30:44 2024 [INFO] Restrict analysis based on specified site file [lofHC_REVELdamaging_clinvar.txt.snps ] [INFO] Loaded [ 184 ] samples from genotype files [INFO] Loaded [ 185 ] sample pheontypes [INFO] Discard [ 1 ] samples as they do not have genotypes [INFO] Loaded 184 male, 0 female and 0 sex-unknonw samples from pheno.HISP.txt [INFO] Loaded 15 cases, 169 controls, and 0 missing phenotypes [WARN] -- Enabling binary phenotype mode -- [INFO] Analysis begins with [ 184 ] samples... [INFO] Impute missing genotype to mean (by default) [INFO] Set upper minor allele frequency limit to 0.01 [INFO] Analysis started **[INFO] Analyzed [ 324 ] variants** [INFO] Analysis ends at: Fri Feb 16 12:33:16 2024

And the header of the Output MetaScore.Assoc.gz is

ProgramName=Rvtests

Version=20171009

Samples=184

AnalyzedSamples=184

Families=184

AnalyzedFamilies=184

Founders=184

AnalyzedFounders=184

InverseNormal=OFF

TraitSummary min 25th median 75th max mean variance

Trait 1 1 1 1 2 1.08152 0.0752851

AnalyzedTrait 0 0 0 0 1 0.0815217 0.0752851

NullModelEstimates

Name Beta SD

Intercept -2.42185 0.0725838

Sigma2 NA NA

And here is my command --meta cov showing the segmentation fault. ` Effective Options --inVcf vcfs/cohort.forRVtest.21.reID.vcf.gz --out rvtest.HISP.nogene.CategB.cov.21 --pheno pheno.HISP.txt --peopleIncludeFile HISP/keep.samples.HISP --peopleExcludeFile vcfs/exclude_samples.txt --siteFile lofHC_REVELdamaging_clinvar.txt.snps --meta cov --freqUpper 0.01 --noweb

[INFO] Program version: 20171009 [INFO] Analysis started at: Fri Feb 16 12:33:50 2024 Include sample [ BKP000684 ]. Include sample [ BKR003225 ]. . . and so on [INFO] Restrict analysis based on specified site file [ lofHC_REVELdamaging_clinvar.txt.snps ] [INFO] Loaded [ 184 ] samples from genotype files [INFO] Loaded [ 185 ] sample pheontypes [INFO] Discard [ 1 ] samples as they do not have genotypes [INFO] Loaded 184 male, 0 female and 0 sex-unknonw samples from pheno.HISP.txt [INFO] Loaded 15 cases, 169 controls, and 0 missing phenotypes [WARN] -- Enabling binary phenotype mode -- [INFO] Analysis begins with [ 184 ] samples... [INFO] Meta analysis uses window size 1,000,000 to produce covariance statistics under additive model [INFO] Impute missing genotype to mean (by default) [INFO] Set upper minor allele frequency limit to 0.01 [INFO] Analysis started

Segmentation fault `

I would appreciate any insights into this. Thanks.