Open jpjoines opened 1 month ago
can you run successfully with the example input file?
On Tue, Jun 4, 2024 at 7:06 PM jpjoines @.***> wrote:
I just tried running the Linux binary of Kindred from https://www.haplotype.org/software.html. It processes a few loci, then segfaults. Nothing is written to the log file.
$ ls kindred populations.snps.vcf.gz populations.snps.vcf.gz.tbi
$ /kindred -i populations.snps.vcf.gz -o tvi -t 28
Kindred 0.81 by Yongtao Guan at Framingham Heart Study National Heart, Lung, and Blood Institute (C) 2023
number of samples: 486
Segmentation fault (core dumped)
$ ls kindred populations.snps.vcf.gz populations.snps.vcf.gz.tbi tvi.log
$ wc -l tvi.log 0 tvi.log
I'm using Rocky Linux 8.6.
— Reply to this email directly, view it on GitHub https://github.com/haplotype/kindred/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWHWL6IQ7CQVZXPZKBJU5DZFZB77AVCNFSM6AAAAABIZRC22CVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZTINJSGMYTCNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Oh, I didn't realize the in.vcf.gz file from the examples in README.md actually existed. I tested with that and it works fine, so it must be something with my VCF file which looks like this:
##fileformat=VCFv4.2
##fileDate=20240529
##source="Stacks v2.64"
##INFO=<ID=AD,Number=R,Type=Integer,Description="Total Depth for Each Allele">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allele Depth">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=loc_strand,Number=1,Type=Character,Description="Genomic strand the corresponding Stacks locus aligns on">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample18a01
13 108 13:107 A G . PASS NS=404;AF=0.057 GT:DP:AD:GQ:GL 0/0:7:7,0:36:-0.00,-2.97,-38.62
49 414 49:413 G A . PASS NS=320;AF=0.064 GT:DP:AD:GQ:GL 0/0:4:4,0:25:-0.01,-1.90,-18.20
73 116 73:115 G T . PASS NS=300;AF=0.052 GT:DP:AD:GQ:GL 0/0:23:23,0:40:-0.00,-8.12,-111.79
102 52 102:51 A C . PASS NS=443;AF=0.412 GT:DP:AD:GQ:GL 1/1:19:0,19:40:-100.39,-5.81,-0.00
176 77 176:76 G A . PASS NS=289;AF=0.273 GT:DP:AD:GQ:GL 0/0:25:25,0:40:-0.00,-7.78,-132.83
202 90 202:89 A G . PASS NS=286;AF=0.080 GT:DP:AD:GQ:GL ./.
210 39 210:38 A G . PASS NS=263;AF=0.101 GT:DP:AD:GQ:GL 0/1:55:34,21:40:-52.18,0.00,-96.09
257 109 257:108 G C . PASS NS=315;AF=0.079 GT:DP:AD:GQ:GL 0/0:13:13,0:40:-0.00,-4.63,-64.84
269 131 269:130 C T . PASS NS=454;AF=0.097 GT:DP:AD:GQ:GL 0/0:39:39,0:40:-0.00,-12.37,-210.82
328 142 328:141 A G . PASS NS=298;AF=0.460 GT:DP:AD:GQ:GL ./.
369 52 369:51 T C . PASS NS=380;AF=0.222 GT:DP:AD:GQ:GL 0/0:41:41,0:40:-0.00,-12.50,-224.43
376 205 376:204 C T . PASS NS=304;AF=0.066 GT:DP:AD:GQ:GL 0/0:9:9,0:40:-0.00,-3.57,-32.03
423 144 423:143 T G . PASS NS=300;AF=0.052 GT:DP:AD:GQ:GL 0/0:14:14,0:40:-0.00,-5.26,-70.64
can you use bcftools to select common SNPs from chr22 and run kindred? there are examples on how to do this in README.md.
On Wed, Jun 5, 2024 at 13:34 jpjoines @.***> wrote:
Oh, I didn't realize the in.vcf.gz file from the examples in README.md actually existed. I tested with that and it works fine, so it must be something with my VCF file which looks like this:
fileformat=VCFv4.2
fileDate=20240529
source="Stacks v2.64"
INFO=
INFO=
INFO=
INFO=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
INFO=
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample18a01
13 108 13:107 A G . PASS NS=404;AF=0.057 GT:DP:AD:GQ:GL 0/0:7:7,0:36:-0.00,-2.97,-38.62 49 414 49:413 G A . PASS NS=320;AF=0.064 GT:DP:AD:GQ:GL 0/0:4:4,0:25:-0.01,-1.90,-18.20 73 116 73:115 G T . PASS NS=300;AF=0.052 GT:DP:AD:GQ:GL 0/0:23:23,0:40:-0.00,-8.12,-111.79 102 52 102:51 A C . PASS NS=443;AF=0.412 GT:DP:AD:GQ:GL 1/1:19:0,19:40:-100.39,-5.81,-0.00 176 77 176:76 G A . PASS NS=289;AF=0.273 GT:DP:AD:GQ:GL 0/0:25:25,0:40:-0.00,-7.78,-132.83 202 90 202:89 A G . PASS NS=286;AF=0.080 GT:DP:AD:GQ:GL ./. 210 39 210:38 A G . PASS NS=263;AF=0.101 GT:DP:AD:GQ:GL 0/1:55:34,21:40:-52.18,0.00,-96.09 257 109 257:108 G C . PASS NS=315;AF=0.079 GT:DP:AD:GQ:GL 0/0:13:13,0:40:-0.00,-4.63,-64.84 269 131 269:130 C T . PASS NS=454;AF=0.097 GT:DP:AD:GQ:GL 0/0:39:39,0:40:-0.00,-12.37,-210.82 328 142 328:141 A G . PASS NS=298;AF=0.460 GT:DP:AD:GQ:GL ./. 369 52 369:51 T C . PASS NS=380;AF=0.222 GT:DP:AD:GQ:GL 0/0:41:41,0:40:-0.00,-12.50,-224.43 376 205 376:204 C T . PASS NS=304;AF=0.066 GT:DP:AD:GQ:GL 0/0:9:9,0:40:-0.00,-3.57,-32.03 423 144 423:143 T G . PASS NS=300;AF=0.052 GT:DP:AD:GQ:GL 0/0:14:14,0:40:-0.00,-5.26,-70.64
— Reply to this email directly, view it on GitHub https://github.com/haplotype/kindred/issues/2#issuecomment-2150599556, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWHWLZTQHNHSW4VRK6PR3TZF5D2NAVCNFSM6AAAAABIZRC22CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGU4TSNJVGY . You are receiving this because you commented.Message ID: @.***>
The chromosomes in this VCF are not actually chromosomes. They are RAD loci. This particular VCF was generated with only one random SNP per RAD locus.
So, chr22 is actually RAD locus 22, with only one SNP, and that SNP has MAF of 0.005.
I did subset the VCF to RAD loci / SNPs with MAF >= 0.01 and reran kindred. This time there was a core dump:
$ kindred -i maf01.vcf.gz -o maf01 -t 14
Kindred 0.81 by Yongtao Guan at Framingham Heart Study
National Heart, Lung, and Blood Institute (C) 2023
##number of samples: 486
##number of biallelic SNPs: 2609
##number of biallelic SNPs without tag: AF = 0
##processed variants = 2836
##init thread pool n = 14
sumd 0.979438 deviates from 1, kindred renomralize
. use -k and examine .kin.gz file for un-renomorlized x.
##processed pairs: 118341 or 100%
##drain thread pool.
##wrote maf01.rkm.gz.
##read vcf < 1 seconds
##compute kinship < 2 seconds
*** stack smashing detected ***: terminated
Aborted (core dumped)
$ cat maf01.log
../kindred -i maf01.vcf.gz -o maf01 -t 14
##m_nb = 100
##m_se = 0.040
##m_maf = 0.010
##m_nth = 14
##m_jstates = 9
##m_bychr = 0
##m_yesk = 0
##m_gsl = 0
##number of samples: 486
##number of biallelic SNPs: 2609
##number of biallelic SNPs without tag: AF = 0
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1
22 1
$
are the numbers in the output maf01.rkm.gz make sense? are the diagonal numbers around 1? is it possible for you to share your vcf file with me? a subset of SNPs and samples will do.
On Wed, Jun 5, 2024 at 14:41 jpjoines @.***> wrote:
The chromosomes in this VCF are not actually chromosomes. They are RAD loci. This particular VCF was generated with only one random SNP per RAD locus.
So, chr22 is actually RAD locus 22, with only one SNP, and that SNP has MAF of 0.005.
I did subset the VCF to RAD loci / SNPs with MAF >= 0.01 and reran kindred. This time there was a core dump:
$ kindred -i maf01.vcf.gz -o maf01 -t 14
Kindred 0.81 by Yongtao Guan at Framingham Heart Study National Heart, Lung, and Blood Institute (C) 2023
number of samples: 486
number of biallelic SNPs: 2609
number of biallelic SNPs without tag: AF = 0
processed variants = 2836
init thread pool n = 14
sumd 0.979438 deviates from 1, kindred renomralize . use -k and examine .kin.gz file for un-renomorlized x.
processed pairs: 118341 or 100%
drain thread pool.
wrote maf01.rkm.gz.
read vcf < 1 seconds
compute kinship < 2 seconds
stack smashing detected : terminated Aborted (core dumped)
$ cat maf01.log ../kindred -i maf01.vcf.gz -o maf01 -t 14
m_nb = 100
m_se = 0.040
m_maf = 0.010
m_nth = 14
m_jstates = 9
m_bychr = 0
m_yesk = 0
m_gsl = 0
number of samples: 486
number of biallelic SNPs: 2609
number of biallelic SNPs without tag: AF = 0
1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18 1 19 1 20 1 21 1 22 1 $
— Reply to this email directly, view it on GitHub https://github.com/haplotype/kindred/issues/2#issuecomment-2150719008, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWHWLYYNYRB73J4YV67XTDZF5LWDAVCNFSM6AAAAABIZRC22CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQG4YTSMBQHA . You are receiving this because you commented.Message ID: @.***>
The diagonals are around 1. The off-diagonal values are higher than I would expect.
I've attached a VCF with all SNPs with MAF >= 0.01, subset to 16 individuals: maf01.popa.popb.vcf.gz
remember, the entries in the matrix are twice of the kinship. if this doesn’t explain the higher value, then is the AF tag in the vcf file the sensible allele freq for the sample?
thanks for the vcf file, if you change the numbers in CHROM column to 1, it will work. Here is why crash happened: Kindred was designed for humans and a vector was generated to count snps on each chromosome, and your vcf file contains too many chromosomes, which blows up the vector. i will fix this and generate a warning. but it has low priority.
thank you.
On Wed, Jun 5, 2024 at 16:50 jpjoines @.***> wrote:
The diagonals are around 1. The off-diagonal values are higher than I would expect.
I've attached a VCF with all SNPs with MAF >= 0.01, subset to 16 individuals: maf01.popa.popb.vcf.gz https://github.com/user-attachments/files/15596495/maf01.popa.popb.vcf.gz
— Reply to this email directly, view it on GitHub https://github.com/haplotype/kindred/issues/2#issuecomment-2150941068, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWHWL6BSH2NFDQFYOVVB6DZF523HAVCNFSM6AAAAABIZRC22CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQHE2DCMBWHA . You are receiving this because you commented.Message ID: @.***>