smgogarten / GWASTools

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis
https://bioconductor.org/packages/GWASTools
16 stars 6 forks source link

createDataFile returns empty diag.geno.file #15

Closed ryshi06 closed 11 months ago

ryshi06 commented 1 year ago

Hi,

I have raw text files from Illumina and I am following the DataCleaning guide to prepare the snpAnnotation and scanAnnotation data frame. But when I tried to generate the gds file, the corresponding diagnostic file returns NULL for several values including sample, sample.match, etc. I have attached the empty output I received. I double-checked the file path, the two annotation dataframe and the raw data files are good. Can you give me some idea why I am having this issue? Thank you!

Screenshot 2023-09-13 at 8 07 31 PM
smgogarten commented 11 months ago

I can't diagnose this problem without a reproducible example.

smgogarten commented 11 months ago

Can you please also supply the code you used that produced the error, and the output of sessionInfo()?

smgogarten commented 11 months ago

Email attachments don't seem to work in replying to GitHub issues. Please go to the Issue page on the github website and paste your code into the comment box.

ryshi06 commented 11 months ago

Generate SnpAnnot:

ref <- read.table("GDA_A1_snps.txt", sep="\t", header = FALSE) colnames(ref) <- c("snpName", "chromosome", "position", "perc_match", "strand", "TOP")

d1 <- subset(ref, select=c("snpName", "chromosome", "position"))

d1$chromosome[d1$chromosome=="X"] <- 23 d1$chromosome[d1$chromosome=="Y"] <- 25 d1$chromosome[d1$chromosome=="MT"] <- 26 d1$chromosome[d1$chromosome=="0"] <- 27

d1$chromosome[d1$chromosome=="XY"] <- 24

d1$chromosome <- as.integer(d1$chromosome) d <- d1[order(d1$chromosome, d1$position), ] d$snpID <- 1:nrow(d) d <- d[,c("snpID", "snpName", "chromosome", "position")] snpAnnot <- SnpAnnotationDataFrame(d)

meta <- varMetadata(snpAnnot) meta[c("snpID", "snpName", "chromosome", "position"), "labelDescription"] <- c("unique integer ID for SNPs (row number assigned)", "BeadSet SNP ID from Illumina", paste("integer code for chromosome: 1:22=autosomes,", "23=X, 24=pseudoautosomal, 25=Y, 26=Mitochondrial, 27=Unknown"), "base pair position on chromosome (build 37)") varMetadata(snpAnnot) <- meta

Generate ScanAnnot: d <- read.table("scanAnnot_fake.txt", sep = "\t", header = TRUE)

scanAnnot <- ScanAnnotationDataFrame(d)

meta <- varMetadata(scanAnnot) meta[c("scanID","scanName","file","sex","race"), "labelDescription"] <- c("unique ID for scans", "subject identifier", "raw data file", "Sex", "Race") varMetadata(scanAnnot) <- meta

Create gds file: path <- "." geno.file <- "tmp.geno.gds"

scan_annotation <- getAnnotation(scanAnnot) snp_annotation <- getAnnotation(snpAnnot)

col.nums <- as.integer(c(1,2,10,11)) names(col.nums) <- c("snp", "sample", "a1", "a2") diag.geno.file <- "diag.geno.RData" diag.geno <- createDataFile(path=path, geno.file, file.type="gds", variables="genotype", snp.annotation=snp_annotation, scan.annotation=scan_annotation, sep.type="\t", skip.num=10, col.total=11, col.nums=col.nums, scan.name.in.file=1, diagnostics.filename=diag.geno.file)

sample1.txt sample2.txt sample3.txt scanAnnot_fake.txt GDA_A1_snps.txt

smgogarten commented 11 months ago

I just ran your code and got the expected (non-empty) output:

> diag.geno
$read.file
[1] 1 1 1

$row.num
[1] 90 90 90

$samples
$samples[[1]]
[1] "sample1"

$samples[[2]]
[1] "sample2"

$samples[[3]]
[1] "sample3"

$sample.match
[1] 1 1 1

$missg
$missg[[1]]
character(0)

$missg[[2]]
character(0)

$missg[[3]]
character(0)

$snp.chk
[1] 1 1 1

$chk
[1] 1 1 1

Details on my R session:

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GWASTools_1.46.0    Biobase_2.60.0      BiocGenerics_0.46.0

loaded via a namespace (and not attached):
 [1] shape_1.4.6          formula.tools_1.7.1  lattice_0.21-8       vctrs_0.6.3         
 [5] tools_4.3.1          generics_0.1.3       sandwich_3.0-2       tibble_3.2.1        
 [9] fansi_1.0.4          RSQLite_2.3.1        pan_1.9              blob_1.2.4          
[13] pkgconfig_2.0.3      jomo_2.7-6           Matrix_1.6-0         data.table_1.14.8   
[17] lifecycle_1.0.3      compiler_4.3.1       MatrixModels_0.5-2   codetools_0.2-19    
[21] SparseM_1.81         quantreg_5.97        GWASExactHW_1.01     glmnet_4.1-8        
[25] mice_3.16.0          pillar_1.9.0         nloptr_2.0.3         tidyr_1.3.0         
[29] MASS_7.3-60          cachem_1.0.8         iterators_1.0.14     rpart_4.1.19        
[33] boot_1.3-28.1        foreach_1.5.2        mitml_0.4-5          nlme_3.1-162        
[37] tidyselect_1.2.0     dplyr_1.1.2          purrr_1.0.1          splines_4.3.1       
[41] operator.tools_1.6.3 fastmap_1.1.1        grid_4.3.1           cli_3.6.1           
[45] magrittr_2.0.3       survival_3.5-5       utf8_1.2.3           broom_1.0.5         
[49] backports_1.4.1      bit64_4.0.5          quantsmooth_1.66.0   logistf_1.26.0      
[53] bit_4.0.5            nnet_7.3-19          lme4_1.1-34          zoo_1.8-12          
[57] memoise_2.0.1        DNAcopy_1.74.1       lmtest_0.9-40        mgcv_1.9-0          
[61] rlang_1.1.1          Rcpp_1.0.11          glue_1.6.2           DBI_1.1.3           
[65] gdsfmt_1.36.1        rstudioapi_0.15.0    minqa_1.2.5          R6_2.5.1 
ryshi06 commented 11 months ago

Thank you. Let me double check my R session info and hopefully I can get the normal output.

Best, Ruyu

From: Stephanie M. Gogarten @.> Date: Thursday, September 21, 2023 at 6:50 PM To: smgogarten/GWASTools @.> Cc: Shi, Ruyu @.>, Author @.> Subject: Re: [smgogarten/GWASTools] createDataFile returns empty diag.geno.file (Issue #15)

I just ran your code and got the expected (non-empty) output:

diag.geno

$read.file

[1] 1 1 1

$row.num

[1] 90 90 90

$samples

$samples[[1]]

[1] "sample1"

$samples[[2]]

[1] "sample2"

$samples[[3]]

[1] "sample3"

$sample.match

[1] 1 1 1

$missg

$missg[[1]]

character(0)

$missg[[2]]

character(0)

$missg[[3]]

character(0)

$snp.chk

[1] 1 1 1

$chk

[1] 1 1 1

Details on my R session:

sessionInfo()

R version 4.3.1 (2023-06-16)

Platform: aarch64-apple-darwin20 (64-bit)

Running under: macOS Ventura 13.4

Matrix products: default

BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles

tzcode source: internal

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] GWASTools_1.46.0 Biobase_2.60.0 BiocGenerics_0.46.0

loaded via a namespace (and not attached):

[1] shape_1.4.6 formula.tools_1.7.1 lattice_0.21-8 vctrs_0.6.3

[5] tools_4.3.1 generics_0.1.3 sandwich_3.0-2 tibble_3.2.1

[9] fansi_1.0.4 RSQLite_2.3.1 pan_1.9 blob_1.2.4

[13] pkgconfig_2.0.3 jomo_2.7-6 Matrix_1.6-0 data.table_1.14.8

[17] lifecycle_1.0.3 compiler_4.3.1 MatrixModels_0.5-2 codetools_0.2-19

[21] SparseM_1.81 quantreg_5.97 GWASExactHW_1.01 glmnet_4.1-8

[25] mice_3.16.0 pillar_1.9.0 nloptr_2.0.3 tidyr_1.3.0

[29] MASS_7.3-60 cachem_1.0.8 iterators_1.0.14 rpart_4.1.19

[33] boot_1.3-28.1 foreach_1.5.2 mitml_0.4-5 nlme_3.1-162

[37] tidyselect_1.2.0 dplyr_1.1.2 purrr_1.0.1 splines_4.3.1

[41] operator.tools_1.6.3 fastmap_1.1.1 grid_4.3.1 cli_3.6.1

[45] magrittr_2.0.3 survival_3.5-5 utf8_1.2.3 broom_1.0.5

[49] backports_1.4.1 bit64_4.0.5 quantsmooth_1.66.0 logistf_1.26.0

[53] bit_4.0.5 nnet_7.3-19 lme4_1.1-34 zoo_1.8-12

[57] memoise_2.0.1 DNAcopy_1.74.1 lmtest_0.9-40 mgcv_1.9-0

[61] rlang_1.1.1 Rcpp_1.0.11 glue_1.6.2 DBI_1.1.3

[65] gdsfmt_1.36.1 rstudioapi_0.15.0 minqa_1.2.5 R6_2.5.1

— Reply to this email directly, view it on GitHubhttps://github.com/smgogarten/GWASTools/issues/15#issuecomment-1730449444, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AONEF5L7TYCPRSFGVYKPL7LX3TACFANCNFSM6AAAAAA4XIA2ME. You are receiving this because you authored the thread.Message ID: @.***>