Closed renhamm closed 1 year ago
Hi Lauren,
Just to make sure I understand you correctly - you have a k-mers-table which you converted to a plink binary format using the kmers_table_to_bed
functionality, then you took the plink and converted it to a VCF and got some k-mers in some accessions with the value "./." which mean a missing value. Is that correct? If this is the case it shouldn't happened.
Yoav
/PATH/build_kmers_table -l kmers_list_paths.txt -k 31 -a kmers_to_use -o kmers_table
/PATH/kmers_table_to_bed -t kmers_table -k 31 -b 90000000000 -p phenotypes.pheno --maf 0.05 --mac 1 -b 10000000 -o PAVtable_PLINK\
plink --bfile /PATH/PAVtable_PLINK.0 --merge-list all_my_files.txt --make-bed --out PAVmerged
/PATH/plink --bfile /PATH/PAVmerged --recode vcf --out PAVmerged
Hi Lauren,
Thank you.
-b
twice in kmers_table_to_bed
? it shouldn't matter, but I am surprised it didn't raise an error.kmers_table
, I know it is probably a big file, but do you think you can share it with me?filter_kmers
functionality ? maybe by comparing the two I can understand what happened.cat /proc/cpuinfo
output).- The pipeline outputs thousands of smaller vcf files, of which some look right and some do not. VCFs #000-099 look perfect, but all VCFs with a number greater than or equal to 100 include the weird ./. missing value option. From a good sub-vcf: 0 0 AAAAAAAAAAAAAAAAAGAAAAAAAAAGCTA N 1 . . PR GT 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 1/1 0/0 1/1 0/0 1/1 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 1/1 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 1/1 0/0 1/1 1/1 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 1/1 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 1/1 1/1 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 1/1 0/0 1/1 0/0 0/0 1/1 0/0 0/0 0/0 0/0 0/0 1/1 1/1 0/0 0/0 0/0 1/1 0/0 From a bad sub-vcf: 0 0 AAGACTGGACAATCACTTACTAGCACGCTCC 1 . . . PR GT ./. 0/0 0/0 0/0 ./. ./. 0/0 ./. 0/0 ./. ./. 0/0 ./. ./. 0/0 ./. 0/0 0/0 0/0 ./. 0/0 0/0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/0 ./. 0/0 ./. ./. 0/0 0/0 0/0 ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/0 0/0 0/0 ./. 0/0 ./. 0/0 ./. ./. ./. 0/0 ./. 0/0 ./. 0/0 ./. ./. ./. ./. 0/0 ./. 0/0 ./. ./. ./. 0/0 0/0 0/0 ./. ./. ./. ./. ./. 0/0 0/0 ./. ./. 0/0 ./. ./. ./. ./. 0/0 0/0 0/0 0/0 ./. ./. ./. ./. ./. ./. ./. 0/0 ./. ./. ./. 0/0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/0 ./. ./. ./. ./. ./. ./. 0/0 ./. 0/0 0/0 0/0 0/0 ./. ./. 0/0 0/0 ./. ./. 0/0 ./. ./. ./. ./. 0/0 0/0 0/0 ./. ./. ./. ./. ./. 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 ./. 0/0 0/0 0/0 ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/0 ./. ./. ./. ./. ./. 0/0 ./. 0/0 0/0 0/0 ./. ./. ./. ./. ./. ./. ./. 0/0 0/0 ./. ./. ./. 0/0 ./. ./. ./. ./. ./. ./. ./. ./. 0/0 ./. 0/0 0/0 0/0 0/0 ./. ./. 0/0 0/0 ./. 0/0 0/0 0/0 0/0 ./. ./. ./. ./. 0/0 ./. ./. 0/0 ./. 0/0 0/0 ./. 0/0 ./. ./. ./. 0/0 0/0 0/0 ./. 0/0 0/0 0/0 ./. ./. ./. 0/0 0/0 ./. ./. 0/0 0/0 ./. 0/0 0/0 0/0 0/0 ./. ./. ./. 0/0 0/0 ./. 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 ./. ./. ./. ./. ./. ./. 0/0 ./. 0/0 0/0 0/0 ./. 0/0 0/0 0/0 0/0 0/0 0/0 ./. 0/0 0/0 ./. ./. ./. 0/0 ./. 0/0 0/0
Dear Lauren,
Thank you. I tried to cut also my A. thaliana k-mers-table to many plink binary file and it didn't show your bug.
filter_kmers
for the two k-mers you outputted above? I would expect the the ./. would stand for "1/1" but there is some glitch in the binary format encoding for some reason. Also the two fields after the k-mer change (N & 1 --> 1 & ".") which also indicate some bit glitch.kmers_table_to_bed
? Or do you see this only after you merge?kmeres_table_to_bed
, you have "\" after the output directory, which shouldn't be there. I don't think it can make any issue like this, but just to make everything clean.Yoav
I'm currently running the requested tasks above, but the kmers_table.table file even when compressed is much to large to upload (47799 MB). Is there an email that I can send it to or share it with?
If you have a link to a drive or ftp you can send it to me to yoav.voichek [at] gmi.oeaw.ac.at.
btw, I have thought about it since and I am most curious about (2) from before. Did you see the "./." when you converted single sub-files or only after merging? because it you don't see it in single files it is an issue with PLINK. I suspect this might be the issue as there are the extra fields which are different (point (1)) and I don't see how I could have changed them in my files.
Hi Lauren,
I am still very curious about what happened here. However, as I see no activity here, should we close it for now?
Best, Yoav
Don't close! I've been processing things slower for a combination of medical reasons and server failures on campus. I'm hoping to get back to you very very shortly.
On Tue, Feb 7, 2023 at 12:22 AM Yoav Voichek @.***> wrote:
Hi Lauren,
I am still very curious about what happened here. However, as I see no activity here, should we close it for now?
Best, Yoav
— Reply to this email directly, view it on GitHub https://github.com/voichek/kmersGWAS/issues/128#issuecomment-1420373120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO2GFKUJ6GKAMWVAJVP57V3WWIA3NANCNFSM6AAAAAATZYXXKQ . You are receiving this because you authored the thread.Message ID: @.***>
Dear Lauren,
No worries. Have a swift recovery.
I just make sure that there are no "orphan" issues left behind. In this case, take your time :)
Best, Yoav
I am currently rerunning the PLINK bed-to-vcf conversion individually to see if this fixes it. When I ran one of the bed files which resulted in the abberant "./." option, it ended up looking fine. This makes me think that there was some error PLINK had in looping through all of the files. I can't guarantee that this will fix it, but it's worth a shot while I wait for our internal data transfer servers to come back online. Hopefully this helps!
It looks like that fixed it! I think there was a misinstallation of an older version of plink (1.07 instead of 1.9) that led to an inappropriate conversion of the "1/1" to "./."
Thanks! We can close this issue now
On Tue, Feb 7, 2023 at 1:34 PM Yoav Voichek @.***> wrote:
Dear Lauren,
No worries. Have a swift recovery.
I just make sure that there are no "orphan" issues left behind. In this case, take your time :)
Best, Yoav
— Reply to this email directly, view it on GitHub https://github.com/voichek/kmersGWAS/issues/128#issuecomment-1421484378, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO2GFKW4X7KCHASOKJVAVNTWWK5X5ANCNFSM6AAAAAATZYXXKQ . You are receiving this because you authored the thread.Message ID: @.***>
When I convert my PAV table to vcf format, I end with three options instead of two: "1/1", "0/0", and "./." The ./. usually means we dont have coverage in this area when looking at normal SNPs, but as our loci are actually unmapped kmers, I'm not sure how the "./." differs from "0/0"? Is this indicative of an upstream error?