An issue was raised while playing with ssimp_chunks.sh, indicating "Error in paste(X1, X2, sep = ":") : object 'X2' not found" (line 80 in the attached log file). It seems this was caused by the failure to read the reference file containing chr/pos (see the lines 41-53). The file downloaded from "https://drive.switch.ch/index.php/s/uOyjAtdvYjxxwZd/download" should be "database.of.builds.1kg.uk10k.hrc.2018.01.18.bin". Although the script aims to save it as a gzipped file ("reference_panels/dbsnp_hg20_chr_pos_sorted.txt.gz"), it's not a zipped file and cannot be read by read_tsv. This resulted in the non-existence of the object "tkg" and the subsequent error.
Will it be possible to provide the correct file "dbsnp_hg20_chr_pos_sorted.txt.gz"?
Instead of using the above reference file, an alternative way I guess is to use the gz files in ~/reference_panles/1000genomes/? However, it requires the modification of the last part of the ssimp_chunks.sh as below:
for (CHRM in 1:22){
file <- "~/reference_panels/1000genomes/ALL.chr{CHRM}.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz"
cat("Start: loading large file\n")
tkg <- read_tsv(file, col_names = TRUE) ## this can take some time
cat("Finished: loading large file\n")
ssimp_chunks.log Dear Zoltan,
An issue was raised while playing with ssimp_chunks.sh, indicating "Error in paste(X1, X2, sep = ":") : object 'X2' not found" (line 80 in the attached log file). It seems this was caused by the failure to read the reference file containing chr/pos (see the lines 41-53). The file downloaded from "https://drive.switch.ch/index.php/s/uOyjAtdvYjxxwZd/download" should be "database.of.builds.1kg.uk10k.hrc.2018.01.18.bin". Although the script aims to save it as a gzipped file ("reference_panels/dbsnp_hg20_chr_pos_sorted.txt.gz"), it's not a zipped file and cannot be read by read_tsv. This resulted in the non-existence of the object "tkg" and the subsequent error.
Will it be possible to provide the correct file "dbsnp_hg20_chr_pos_sorted.txt.gz"?
Instead of using the above reference file, an alternative way I guess is to use the gz files in ~/reference_panles/1000genomes/? However, it requires the modification of the last part of the ssimp_chunks.sh as below: for (CHRM in 1:22){ file <- "~/reference_panels/1000genomes/ALL.chr{CHRM}.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz"
cat("Start: loading large file\n") tkg <- read_tsv(file, col_names = TRUE) ## this can take some time cat("Finished: loading large file\n")
columns
X1 = chr
X2 = pos - hg20
remove MT and PAR and Y and X
tkg <- filter(tkg, X1 %in% 1:22)
create file with ssimp chunks
----------------------------------
sessionInfo()
impute.range <- uk10k.chunks.from.to(ref.file=tkg, nbr.chunks = nbr.chunks) ## returns "chr:pos.start-chr:pos.end"
print.chunks(ssimp.args = ssimp.args, nbr.chunks = nbr.chunks, ref.file = tkg, out.name=out.name) cat("cat2\n")
}
Regards, patrick