haplotype / kindred

infer realized kinship via latent IBD states
MIT License
1 stars 0 forks source link

segmentation fault and empty log file #2

Open jpjoines opened 1 month ago

jpjoines commented 1 month ago
    I just tried running the Linux binary of Kindred from https://www.haplotype.org/software.html.  It processes a few loci, then segfaults.  Nothing is written to the log file.
$ ls
kindred  populations.snps.vcf.gz  populations.snps.vcf.gz.tbi

$ /kindred -i populations.snps.vcf.gz -o tvi -t 28

Kindred 0.81 by Yongtao Guan at Framingham Heart Study
  National Heart, Lung, and Blood Institute (C) 2023
##number of samples: 486
Segmentation fault (core dumped)

$ ls
kindred  populations.snps.vcf.gz  populations.snps.vcf.gz.tbi  tvi.log

$ wc -l tvi.log
0 tvi.log
    I'm using Rocky Linux 8.6.
haplotype commented 1 month ago

can you run successfully with the example input file?

On Tue, Jun 4, 2024 at 7:06 PM jpjoines @.***> wrote:

I just tried running the Linux binary of Kindred from https://www.haplotype.org/software.html.  It processes a few loci, then segfaults.  Nothing is written to the log file.

$ ls kindred populations.snps.vcf.gz populations.snps.vcf.gz.tbi

$ /kindred -i populations.snps.vcf.gz -o tvi -t 28

Kindred 0.81 by Yongtao Guan at Framingham Heart Study National Heart, Lung, and Blood Institute (C) 2023

number of samples: 486

Segmentation fault (core dumped)

$ ls kindred populations.snps.vcf.gz populations.snps.vcf.gz.tbi tvi.log

$ wc -l tvi.log 0 tvi.log

I'm using Rocky Linux 8.6.

— Reply to this email directly, view it on GitHub https://github.com/haplotype/kindred/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWHWL6IQ7CQVZXPZKBJU5DZFZB77AVCNFSM6AAAAABIZRC22CVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZTINJSGMYTCNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jpjoines commented 1 month ago

Oh, I didn't realize the in.vcf.gz file from the examples in README.md actually existed. I tested with that and it works fine, so it must be something with my VCF file which looks like this:

##fileformat=VCFv4.2
##fileDate=20240529
##source="Stacks v2.64"
##INFO=<ID=AD,Number=R,Type=Integer,Description="Total Depth for Each Allele">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allele Depth">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=loc_strand,Number=1,Type=Character,Description="Genomic strand the corresponding Stacks locus aligns on">
#CHROM  POS  ID       REF  ALT  QUAL  FILTER  INFO             FORMAT          sample18a01
13      108  13:107   A    G    .     PASS    NS=404;AF=0.057  GT:DP:AD:GQ:GL  0/0:7:7,0:36:-0.00,-2.97,-38.62
49      414  49:413   G    A    .     PASS    NS=320;AF=0.064  GT:DP:AD:GQ:GL  0/0:4:4,0:25:-0.01,-1.90,-18.20
73      116  73:115   G    T    .     PASS    NS=300;AF=0.052  GT:DP:AD:GQ:GL  0/0:23:23,0:40:-0.00,-8.12,-111.79
102     52   102:51   A    C    .     PASS    NS=443;AF=0.412  GT:DP:AD:GQ:GL  1/1:19:0,19:40:-100.39,-5.81,-0.00
176     77   176:76   G    A    .     PASS    NS=289;AF=0.273  GT:DP:AD:GQ:GL  0/0:25:25,0:40:-0.00,-7.78,-132.83
202     90   202:89   A    G    .     PASS    NS=286;AF=0.080  GT:DP:AD:GQ:GL  ./.
210     39   210:38   A    G    .     PASS    NS=263;AF=0.101  GT:DP:AD:GQ:GL  0/1:55:34,21:40:-52.18,0.00,-96.09
257     109  257:108  G    C    .     PASS    NS=315;AF=0.079  GT:DP:AD:GQ:GL  0/0:13:13,0:40:-0.00,-4.63,-64.84
269     131  269:130  C    T    .     PASS    NS=454;AF=0.097  GT:DP:AD:GQ:GL  0/0:39:39,0:40:-0.00,-12.37,-210.82
328     142  328:141  A    G    .     PASS    NS=298;AF=0.460  GT:DP:AD:GQ:GL  ./.
369     52   369:51   T    C    .     PASS    NS=380;AF=0.222  GT:DP:AD:GQ:GL  0/0:41:41,0:40:-0.00,-12.50,-224.43
376     205  376:204  C    T    .     PASS    NS=304;AF=0.066  GT:DP:AD:GQ:GL  0/0:9:9,0:40:-0.00,-3.57,-32.03
423     144  423:143  T    G    .     PASS    NS=300;AF=0.052  GT:DP:AD:GQ:GL  0/0:14:14,0:40:-0.00,-5.26,-70.64
haplotype commented 1 month ago

can you use bcftools to select common SNPs from chr22 and run kindred? there are examples on how to do this in README.md.

On Wed, Jun 5, 2024 at 13:34 jpjoines @.***> wrote:

Oh, I didn't realize the in.vcf.gz file from the examples in README.md actually existed. I tested with that and it works fine, so it must be something with my VCF file which looks like this:

fileformat=VCFv4.2

fileDate=20240529

source="Stacks v2.64"

INFO=

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample18a01

13 108 13:107 A G . PASS NS=404;AF=0.057 GT:DP:AD:GQ:GL 0/0:7:7,0:36:-0.00,-2.97,-38.62 49 414 49:413 G A . PASS NS=320;AF=0.064 GT:DP:AD:GQ:GL 0/0:4:4,0:25:-0.01,-1.90,-18.20 73 116 73:115 G T . PASS NS=300;AF=0.052 GT:DP:AD:GQ:GL 0/0:23:23,0:40:-0.00,-8.12,-111.79 102 52 102:51 A C . PASS NS=443;AF=0.412 GT:DP:AD:GQ:GL 1/1:19:0,19:40:-100.39,-5.81,-0.00 176 77 176:76 G A . PASS NS=289;AF=0.273 GT:DP:AD:GQ:GL 0/0:25:25,0:40:-0.00,-7.78,-132.83 202 90 202:89 A G . PASS NS=286;AF=0.080 GT:DP:AD:GQ:GL ./. 210 39 210:38 A G . PASS NS=263;AF=0.101 GT:DP:AD:GQ:GL 0/1:55:34,21:40:-52.18,0.00,-96.09 257 109 257:108 G C . PASS NS=315;AF=0.079 GT:DP:AD:GQ:GL 0/0:13:13,0:40:-0.00,-4.63,-64.84 269 131 269:130 C T . PASS NS=454;AF=0.097 GT:DP:AD:GQ:GL 0/0:39:39,0:40:-0.00,-12.37,-210.82 328 142 328:141 A G . PASS NS=298;AF=0.460 GT:DP:AD:GQ:GL ./. 369 52 369:51 T C . PASS NS=380;AF=0.222 GT:DP:AD:GQ:GL 0/0:41:41,0:40:-0.00,-12.50,-224.43 376 205 376:204 C T . PASS NS=304;AF=0.066 GT:DP:AD:GQ:GL 0/0:9:9,0:40:-0.00,-3.57,-32.03 423 144 423:143 T G . PASS NS=300;AF=0.052 GT:DP:AD:GQ:GL 0/0:14:14,0:40:-0.00,-5.26,-70.64

— Reply to this email directly, view it on GitHub https://github.com/haplotype/kindred/issues/2#issuecomment-2150599556, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWHWLZTQHNHSW4VRK6PR3TZF5D2NAVCNFSM6AAAAABIZRC22CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGU4TSNJVGY . You are receiving this because you commented.Message ID: @.***>

jpjoines commented 1 month ago

The chromosomes in this VCF are not actually chromosomes. They are RAD loci. This particular VCF was generated with only one random SNP per RAD locus.

So, chr22 is actually RAD locus 22, with only one SNP, and that SNP has MAF of 0.005.

I did subset the VCF to RAD loci / SNPs with MAF >= 0.01 and reran kindred. This time there was a core dump:

$ kindred -i maf01.vcf.gz -o maf01 -t 14

Kindred 0.81 by Yongtao Guan at Framingham Heart Study
  National Heart, Lung, and Blood Institute (C) 2023
##number of samples: 486
##number of biallelic SNPs: 2609
##number of biallelic SNPs without tag: AF = 0
##processed variants = 2836
##init thread pool n = 14
sumd 0.979438 deviates from 1, kindred renomralize
. use -k and examine .kin.gz file for un-renomorlized x.
##processed pairs: 118341 or 100%
##drain thread pool.
##wrote maf01.rkm.gz.
##read vcf < 1 seconds
##compute kinship < 2 seconds

*** stack smashing detected ***: terminated
Aborted (core dumped)

$ cat maf01.log
../kindred -i maf01.vcf.gz -o maf01 -t 14
##m_nb = 100
##m_se = 0.040
##m_maf = 0.010
##m_nth = 14
##m_jstates = 9
##m_bychr = 0
##m_yesk = 0
##m_gsl = 0
##number of samples: 486
##number of biallelic SNPs: 2609
##number of biallelic SNPs without tag: AF = 0
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1
22 1
$
haplotype commented 1 month ago

are the numbers in the output maf01.rkm.gz make sense? are the diagonal numbers around 1? is it possible for you to share your vcf file with me? a subset of SNPs and samples will do.

On Wed, Jun 5, 2024 at 14:41 jpjoines @.***> wrote:

The chromosomes in this VCF are not actually chromosomes. They are RAD loci. This particular VCF was generated with only one random SNP per RAD locus.

So, chr22 is actually RAD locus 22, with only one SNP, and that SNP has MAF of 0.005.

I did subset the VCF to RAD loci / SNPs with MAF >= 0.01 and reran kindred. This time there was a core dump:

$ kindred -i maf01.vcf.gz -o maf01 -t 14

Kindred 0.81 by Yongtao Guan at Framingham Heart Study National Heart, Lung, and Blood Institute (C) 2023

number of samples: 486

number of biallelic SNPs: 2609

number of biallelic SNPs without tag: AF = 0

processed variants = 2836

init thread pool n = 14

sumd 0.979438 deviates from 1, kindred renomralize . use -k and examine .kin.gz file for un-renomorlized x.

processed pairs: 118341 or 100%

drain thread pool.

wrote maf01.rkm.gz.

read vcf < 1 seconds

compute kinship < 2 seconds

stack smashing detected : terminated Aborted (core dumped)

$ cat maf01.log ../kindred -i maf01.vcf.gz -o maf01 -t 14

m_nb = 100

m_se = 0.040

m_maf = 0.010

m_nth = 14

m_jstates = 9

m_bychr = 0

m_yesk = 0

m_gsl = 0

number of samples: 486

number of biallelic SNPs: 2609

number of biallelic SNPs without tag: AF = 0

1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18 1 19 1 20 1 21 1 22 1 $

— Reply to this email directly, view it on GitHub https://github.com/haplotype/kindred/issues/2#issuecomment-2150719008, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWHWLYYNYRB73J4YV67XTDZF5LWDAVCNFSM6AAAAABIZRC22CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQG4YTSMBQHA . You are receiving this because you commented.Message ID: @.***>

jpjoines commented 1 month ago

The diagonals are around 1. The off-diagonal values are higher than I would expect.

I've attached a VCF with all SNPs with MAF >= 0.01, subset to 16 individuals: maf01.popa.popb.vcf.gz

haplotype commented 1 month ago

remember, the entries in the matrix are twice of the kinship. if this doesn’t explain the higher value, then is the AF tag in the vcf file the sensible allele freq for the sample?

thanks for the vcf file, if you change the numbers in CHROM column to 1, it will work. Here is why crash happened: Kindred was designed for humans and a vector was generated to count snps on each chromosome, and your vcf file contains too many chromosomes, which blows up the vector. i will fix this and generate a warning. but it has low priority.

thank you.

On Wed, Jun 5, 2024 at 16:50 jpjoines @.***> wrote:

The diagonals are around 1. The off-diagonal values are higher than I would expect.

I've attached a VCF with all SNPs with MAF >= 0.01, subset to 16 individuals: maf01.popa.popb.vcf.gz https://github.com/user-attachments/files/15596495/maf01.popa.popb.vcf.gz

— Reply to this email directly, view it on GitHub https://github.com/haplotype/kindred/issues/2#issuecomment-2150941068, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWHWL6BSH2NFDQFYOVVB6DZF523HAVCNFSM6AAAAABIZRC22CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQHE2DCMBWHA . You are receiving this because you commented.Message ID: @.***>