hewm2008 / VCF2PCACluster

A new simple and efficient software to PCA and Cluster For popolation VCF File
MIT License
61 stars 5 forks source link

Assertion error in DenseCoeffsBase.h #5

Closed YaoLei-Leo closed 10 months ago

YaoLei-Leo commented 10 months ago

Hi Weiming,

I am trying to run PCA analysis to a VCF with 2,659 samples and 17,110,876 variants. However, I got this error after running with default parameters. Could you help check with it? Thanks for your time!

Leo

Running log:

warning skip Indel site, there are total skip Indel sites number is : 3252724

Warning skip non bi-allelic(Singleton/ThreeMulti allelic) site, and total skip allelic sites number is :284

Warning skip high missing site, and total skip allelic sites number is :1040298

Warning skip low Minor Allele Frequency site, and total skip allelic sites number is :370246

Warning skip Sex chr Site, and total skip allelic sites number is :632171

After Filtering, total Number of 11815153 SNPs were taken for the PCA calculation

Start To Create Normalized_IBS/Yang/BaldingNicolsKinship ... Eigenvalues of 2659 individuals have been saved in [ PCAanalysis.eigenval ]. EM init K by auto of K-mean, Best cluster K = 1 VCF2PCACluster: ./include/Eigen/src/Core/DenseCoeffsBase.h:427: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with Derived = Eigen::Matrix<int, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = int; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed. /var/spool/pbs/mom_priv/jobs/1410896.omics.SC: line 11: 164918 Aborted (core dumped) VCF2PCACluster -InVCF Merged.hg38.GnomADcommonVariants.norm.rmDup.rmNonAlt.vcf.gz -OutPut PCAanalysis

hewm2008 commented 10 months ago

I guess there may be ‘Nan’ in the kinship matrix. This may be caused by too many misses in a certain sample. I suggest you filter the sample with serious misses and run it again.

YaoLei-Leo commented 10 months ago

I guess there may be ‘Nan’ in the kinship matrix. This may be caused by too many misses in a certain sample. I suggest you filter the sample with serious misses and run it again.

Hi Weiming,

Thanks for your suggestion, I will rerun it with filtration of missing samples.

JasonforMn commented 4 months ago

Any chance to cause "Segmentation fault (core dumped)" when run VCF2PCACluster?

hewm2008 commented 4 months ago

This is most likely due to a serious miss genotype in a sample. Maybe the sample is too far away from the ref , or the sequencing depth is not enough too low . It is recommended to check the kinship matrix to see the samples corresponding to the (nan NA inf)