Closed richelbilderbeek closed 3 years ago
Great! I'm just thinking it may be better to name the files HumanOrigins249_tiny{.bed/.bim/.fam} so its clear what data set they actually contain (and that it's the same data set that is in EIGENSTRAT format).
I'm just thinking it may be better to name the files HumanOrigins249_tiny{.bed/.bim/.fam}
@kausmees I understand that idea!
However, I am worried about naming conflicts between the two .fam files and just checked:
richel@N141CU:~/GitHubs/GenoCAE/example_tiny$ head HumanOrigins249_tiny.fam plink.fam
==> HumanOrigins249_tiny.fam <==
BantuKenya HGDP01405 0 0 0 1
BantuKenya HGDP01408 0 0 0 1
BantuKenya HGDP01414 0 0 0 1
BantuKenya HGDP01417 0 0 0 1
BantuKenya HGDP01418 0 0 0 1
Biaka HGDP00454 0 0 0 1
Biaka HGDP00455 0 0 0 1
Biaka HGDP00457 0 0 0 1
Biaka HGDP00458 0 0 0 1
Biaka HGDP00459 0 0 0 1
==> plink.fam <==
BantuKenya BantuKenya 0 0 HGDP01405 -9
BantuKenya BantuKenya 0 0 HGDP01408 -9
BantuKenya BantuKenya 0 0 HGDP01414 -9
BantuKenya BantuKenya 0 0 HGDP01417 -9
BantuKenya BantuKenya 0 0 HGDP01418 -9
Biaka Biaka 0 0 HGDP00454 -9
Biaka Biaka 0 0 HGDP00455 -9
Biaka Biaka 0 0 HGDP00457 -9
Biaka Biaka 0 0 HGDP00458 -9
Biaka Biaka 0 0 HGDP00459 -9
Apparently, the .fam files are different.
If you still thing using the same HumanOrigins249_tiny
prefix, I'll change it.
Uh, maybe the PLINK file conversion simply went sideways :confused:, there was a warning when I run the R script. I'll investigate :monocle_face:
I was able to find some old conversion scripts (If you're interested, I used the convertf program from makers of EIGENSTRAT https://reich.hms.harvard.edu/software/InputFileFormats ) and did the conversion :) Added the files in commit c41065b
@kausmees thanks so much! Boy, that I will enjoy your work right away; a great start of a day :-)
Here I have converted the EIGENSTRAT example files to PLINK format to fix #11 and #13 .
The R script I used is below and also included within the commit history.