Hello,
I am testing out the GEUV.compute_weights.sh in just the 1000G.EUR population and keep running into issues.
It never creates a .hsq and gives me this error, indicating that the tmp file tmp.cv.BSLMM.param.txt doesn't exist and indeed it's never created.
The files in the .tmp folder it does create include:
.bed/.bim/.fam
.tmp.bed/tmp.bim/tmp.fam
.tmp.cv.bed / .tmp.cv.bim / .tmp.cv.fam
Error in file(file, "rt") : cannot open the connection
Calls: weights.bslmm -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file 'tmp/1_10/GD462.GeneQuantRPKM.50FN.samplename.resk10.auto.txt.ENSG00000152931.6.tmp.cv.BSLMM.param.txt': No such file or directory
Execution halted
Script below.
#!/bin/sh
### -n requests total cores. In this case, we are requesting 2 cores
#SBATCH -n 2
### -N requests nodes. In this case, we are requesting 1 node
#SBATCH -N 1
### sends the stdout output of the script to a file appended with the job ID
#SBATCH --output job%j.out
### sends the stderr output of the script to a file appended with the job ID
#SBATCH --error job%j.err
cd /work/claygila/InterLymph/TWAS/GEU_weights
##Update test myk code
module load python3/anaconda/2021.11
source activate /home/claygila/.conda/envs/R_env
module load plink
# MAKE SURE FUSION.compute_weights.R IS IN YOUR PATH
# FILL IN THESE PATHS
GCTA="/work/claygila/InterLymph/TWAS/GEU_weights/gcta_nr_robust"
PLINK="/work/claygila/InterLymph/TWAS/GEU_weights/plink_linux_x86_64_20231211/plink"
GEMMA="/work/claygila/ENVs/gemma_env/bin/gemma"
# ALTERNATIVELY: ENSURE THAT plink, gcta, gemma CAN BE CALLED FROM PATH AND REMOVE --PATH_* FLAGS BELOW
# PATH TO DIRECTORY CONTAINING LDREF DATA (FROM FUSION WEBSITE or https://data.broadinstitute.org/alkesgroup/FUSION/LDREF.tar.bz2)
LDREF="/work/claygila/InterLymph/TWAS/GEU_weights/LDREF"
# THIS IS USED TO RESTRICT INPUT SNPS TO REFERENCE IDS ONLY
# PATH TO GEUVADIS GENE EXPRESSION MATRIX:
PRE_GEXP="GD462.GeneQuantRPKM.50FN.samplename.resk10.auto.txt"
# GEUVADIS DATA WAS DOWNLOADED FROM https://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/files/analysis_results/
# GENERATE ID FILE
cat $PRE_GEXP | head -n1 | tr '\t' '\n' | tail -n+5 | awk '{ print $1,$1 }' > ${PRE_GEXP}.ID
# PATH TO PREFIX FOR GEUVADIS GENOTYPES SPLIT BY CHROMOSOME
# SUBSAMPLE THESE TO THE LDREF SNPS FOR EFFICIENCY
PRE_GENO="1000G.EUR"
# PATH TO OUTPUT DIRECTORY (population-specific subdirs will be made)
OUT_DIR="./WEIGHTS"
# ROWS IN THE MATRIX TO ANALYZE (FOR BATCHED RUNS)
BATCH_START=1
BATCH_END=10
# --- BEGIN SCRIPT:
NR="${BATCH_START}_${BATCH_END}"
mkdir --parents tmp/$NR
mkdir --parents hsq/$NR
mkdir --parents out/$NR
# THIS IS DIRECTORY WHERE THE OUTPUT WILL GO:
mkdir $OUT_DIR
# Loop through each gene expression phenotype in the batch
cat $PRE_GEXP | awk -vs=$BATCH_START -ve=$BATCH_END 'NR > s && NR <= e' | while read PARAM; do
# Get the gene positions +/- 500kb
CHR=`echo $PARAM | awk '{ print $3 }'`
P0=`echo $PARAM | awk '{ print $4 - 0.5e6 }'`
P1=`echo $PARAM | awk '{ print $4 + 0.5e6 }'`
GNAME=`echo $PARAM | awk '{ print $1 }'`
OUT="tmp/$NR/$PRE_GEXP.$GNAME"
echo $GNAME $CHR $P0 $P1
# Pull out the current gene expression phenotype
echo $PARAM | tr ' ' '\n' | tail -n+5 | paste $PRE_GEXP.ID - > $OUT.pheno
# Get the locus genotypes for all samples and set current gene expression as the phenotype
$PLINK --bfile $PRE_GENO.$CHR --pheno $OUT.pheno --allow-no-sex --maf 0.01 --make-bed --out $OUT --keep $OUT.pheno --chr $CHR --from-bp $P0 --to-bp $P1 --extract $LDREF/1000G.EUR.$CHR.bim
# Process all samples together (for reference purposes only since this is mult-ethnic data)
FINAL_OUT="$OUT_DIR/.$GNAME"
# MAKE SURE FUSION.compute_weights.R IS IN YOUR PATH
Rscript FUSION.compute_weights.R --bfile $OUT --tmp $OUT.tmp --out $FINAL_OUT --verbose 0 --save_hsq --hsq_set 0.01 --PATH_gcta $GCTA --PATH_gemma $GEMMA --models blup,lasso,top1,enet
# Append heritability output to hsq file
cat $FINAL_OUT.hsq >> hsq/$NR.hsq
done
The files it generates for each row of the expression files: 1_10
Hello, I am testing out the GEUV.compute_weights.sh in just the 1000G.EUR population and keep running into issues.
It never creates a .hsq and gives me this error, indicating that the tmp file tmp.cv.BSLMM.param.txt doesn't exist and indeed it's never created.
The files in the .tmp folder it does create include: .bed/.bim/.fam .tmp.bed/tmp.bim/tmp.fam .tmp.cv.bed / .tmp.cv.bim / .tmp.cv.fam
Error in file(file, "rt") : cannot open the connection Calls: weights.bslmm -> read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'tmp/1_10/GD462.GeneQuantRPKM.50FN.samplename.resk10.auto.txt.ENSG00000152931.6.tmp.cv.BSLMM.param.txt': No such file or directory Execution halted
Script below.
The files it generates for each row of the expression files: 1_10
Error / Out files: job16547466.out.txt job16547466.err.txt
Thanks for you insights, Alyssa