pjgreer / ukb-rap-tools

Scripts and workflows for use analyzing UK Biobank data from the DNANexus Research Analysis Platform
37 stars 8 forks source link

LiftOver hg37->hg38 (Genotype Data) #9

Closed alyssacl closed 1 year ago

alyssacl commented 1 year ago

Hi Phil, I am trying to run regenie using newly imputed TopMed data and your scripts here.

The GTprep scripts work just fine as I am using the same genotype data (ukb22418). 01-GTprep-merge-files-dxfuse.sh 02-GTprep-qc-filter.sh 03-GTprep-ldprune.sh

However, once I get to liftOver scripts I am having some issues. I am thinking I have to perform the liftOver because the genotype data (ukb22418) is on build hg37, and the new TopMed imputed data in hg38.

The first liftOver script (04-GTprep-liftover38-vcf.sh) works fine, I am able to convert the QCed, pruned plink cohort file generated in 03-GTprep-ldprune.sh into vcf format using plink in preparation for liftover from b37 to b38 and /data/gt_genrel_block/ukb_gt_p_temp.vcf.gz is created.

When I run 05-GTprep-liftover38.sh it ran for about 5 hours seemingly performing the liftOver. However then this error occurs (attached full error log) and job is terminated.

[Fri Mar 10 03:26:04 UTC 2023] picard.vcf.LiftoverVcf done. Elapsed time: 273.93 minutes. Runtime.totalMemory()=7847542784 ++ bcftools sort -o ukb_gt_lo38_sort.vcf.gz -O z ukb_gt_lo38.vcf Writing to /tmp/bcftools.dajOkH CPU: 8% (16 cores) Memory: 2573/63518MB Storage: 447/562GB Net: 0↓/0↑MBps CPU: 6% (16 cores) Memory: 2576/63518MB Storage: 477/562GB Net: 0↓/0↑MBps CPU: 6% (16 cores) Memory: 2585/63518MB Storage: 510/562GB Net: 0↓/0↑MBps CPU: 6% (16 cores) Memory: 2587/63518MB Storage: 539/562GB Net: 0↓/0↑MBps [buf_flush] Error: cannot write to /tmp/bcftools.dajOkH/00189.bcf Cleaning

job-GQ5644QJ42yQ75xG75zzZ4PK.Error.Log.txt 05-GTprep-liftover38.sh.txt

Any suggestions? Is it because I am still referencing WES data in script? Thank you, Alyssa

pjgreer commented 1 year ago

It ran out of disk space. The tell is in the line before [buf_flush] You are at 539GB of 562GB.

Change the vm instance to mem2_ssd2_v2_x16 this will double the storage space. If you need more disk space than that, try mem3_ssd3_x4 for 2500GB or mem3_ssd3_x8 for 5000GB. Many of the steps (bcftools sort definitely) do not use multiple threads, so many cores is a little bit of a waste.

This DNANexus file explains the size of all possible instances: https://platform.dnanexus.com/resources/UKB_Rate_Card-Current.pdf