zhengxwen / gdsfmt

R Interface to CoreArray Genomic Data Structure (GDS) Files (Development version only)
http://www.bioconductor.org/packages/gdsfmt
18 stars 4 forks source link

In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "". #36

Open jingydz opened 3 months ago

jingydz commented 3 months ago

Hi, when I want to add annotations to my gds file, I got this error. Could you help me to solve this?

Warning messages: 1: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "". 2: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "". 3: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "". 4: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "". 5: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "". 6: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "". 7: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "". 8: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "". 9: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "".

My command is: Rscript 0.2.3gds2agds.R 1

and the R script is: ##########################################################################

Input

##########################################################################

gds file

dir_geno <- "/xxx/variants/rare_variant/GDS_file/" gds_file_name_1 <- "phenotype.chr" gds_file_name_2 <- ".2802.mac1.filt.gds"

annotation file (output of Annotate.R)

dir_anno <- "/xxx/variants/rare_variant/Anno/" anno_file_name_1 <- "Anno_chr" anno_file_name_2 <- "_STAARpipeline.csv"

chr <- as.numeric(commandArgs(TRUE)[1])

###########################################################################

Main Function

###########################################################################

load required package

library(gdsfmt) library(SeqArray) library(SeqVarTools) library(readr)

read annotation data

FunctionalAnnotation <- read_csv(paste0(dir_anno,"chr",chr,"/",anno_file_name_1,chr,anno_file_name_2), col_types=list(col_character(),col_double(),col_double(),col_double(),col_double(), col_double(),col_double(),col_double(),col_double(),col_double(), col_character(),col_character(),col_character(),col_double(),col_character(), col_character(),col_character(),col_character(),col_character(),col_double(), col_double(),col_character()))

dim(FunctionalAnnotation)

rename colnames

colnames(FunctionalAnnotation)[2] <- "apc_conservation" colnames(FunctionalAnnotation)[7] <- "apc_local_nucleotide_diversity" colnames(FunctionalAnnotation)[9] <- "apc_protein_function"

open GDS

gds.path <- paste0(dir_geno,gds_file_name_1,chr,gds_file_name_2) genofile <- seqOpen(gds.path, readonly = FALSE)

Anno.folder <- index.gdsn(genofile, "annotation/info") add.gdsn(Anno.folder, "FunctionalAnnotation", val=FunctionalAnnotation, compress="LZMA_ra", closezip=TRUE)

seqClose(genofile)

zhengxwen commented 3 months ago

As it said, Missing characters are converted to "".

To store missing characters in GDS files, you should use SeqArray::seqAddValue() instead of add.gdsn. See the help file of SeqArray::seqAddValue for more details.