Closed TrumanZYX closed 1 year ago
Based on the info here: https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=170
ukb23157 is in pVCF format (.vcf.gz .vcf.gz.tbi) ukb23158 is in plink format (.bed .bim .fam) ukb23159 is in BGEN format (.bgen .bgi, .sample)
Both the bgen and the plink format files were created directly from the .pVCF file. Therefore, they should not materially differ from each other. Other than the bgen and pVCF will have data on read depth and QC that the plink file discarded.
Plink and REGENIE both prefer data file in plink format (either the older bed/bim/fam or the newer pgen/pvar/psam format). Both programs can read the .bgen format, but it is slower because they have to convert it internally to plink format.
If you are doing this to filter the data on QC or read depth, then you should make an intermediate script where you filter the bgen or pVCF file and save it out to a plink file for use in the actual analysis.
I haven't fully explored the bgen format nor most of the tools that use it. You need to look into the following programs to see if they will do any of the things you wish to achieve. If they do not, you may have to go back to the pVCF files and filter them using vcftools, samtools, or bcftools. You can then create your new plink files from there.
QCtool https://www.well.ox.ac.uk/~gav/qctool_v2/
or BGENIX https://enkre.net/cgi-bin/code/bgen/doc/trunk/doc/wiki/bgenix.md
All of these tools are already included in Swiss-Army-Knife
Dear, do you know if it is possible