Closed tytrhr closed 7 months ago
Hi @tytrhr,
Thanks for your question. You are correct. STAARpipeline takes genotype data by chromosome to analyze large sequencing data, and thus you may split VCF files by chromosome before VCF2GDS processing.
Best, Xihao
Thanks for your reply, I found a script that can implement this step, “Rscript convertVCF2GDS.R import.format vcf base.filename 2 chr22.vcf.gz chr2.vcf.gz”, I would like to ask if this command setting is correct? In addition, the vcf file is split, so does this step require a single processing? I find that when I run this scripts, the results are merged together, only base.filename.gds. Looking forward to your reply, thank you very much!
Hi @tytrhr, do you have all VCF files split by each chromosome already?
Hi, xihao, yes, I have. Thank you very much for your reply! In addition, I would like to ask, how does the linux system correctly download the FAVORannotator's database? I find the FAVORannotator's database on the https://favor.genohub.org, such as "chr1 CSV (31.2 GB) Download", right click on the "Download" to copy address, use wget to download, but the file is 6170506 and can not be used; directly click "Download" and then upload the linux system, the file is chr1.tar.gz and can be used correctly. I don't know what causes this, can you give me some suggestions?
hello,may I ask whether the vcf file needs to be divided by chromosome before vcf2GDS processing?