hakyimlab / PredictDB-Tutorial

Tutorial for running the PredictDB pipeline with GEUVADIS data
9 stars 11 forks source link

#Error. Could not read input tissue database. #6

Open hust-yyan opened 2 years ago

hust-yyan commented 2 years ago

Hi, I recently performed a TWAS study on miRNA expression. After creating the predict database using our miRNA expression and genotype data, we encountered the following error during s-Predixcan analysis. image Here is the code I used to create the Predict database library(DBI) library(RSQLite) "%&%" <- function(a,b) paste(a,b, sep='') driver <- dbDriver('SQLite') model_summaries <- read.table('./summary/Model_training_chr1_model_summaries_filter.txt',header = T, stringsAsFactors = F) tiss_summary <- read.table('./summary/Model_training_chr1_summary.txt', header = T, stringsAsFactors = F) n_samples <- tiss_summary$n_samples for (i in 2:22) { model_summaries <- rbind(model_summaries, read.table('./summary/Model_training_chr'%&%as.character(i) %&% '_model_summaries_filter.txt', header = T, stringsAsFactors = F)) tiss_summary <- rbind(tiss_summary, read.table('./summary/Model_training_chr' %&% as.character(i) %&% '_summary.txt', header = T, stringsAsFactors = F)) } model_summaries <- rename(model_summaries, gene = gene_id)

Create a database connection

conn <- dbConnect(drv = driver, './dbs/miRNA_expression_204.db') dbWriteTable(conn, 'model_summaries', model_summaries, overwrite = TRUE) dbExecute(conn, "CREATE INDEX gene_model_summary ON model_summaries (gene)")

Weights Table -----

weights <- read.table('./weights/Model_training_chr1_weights.txt', header = T,stringsAsFactors = F) for (i in 2:22) { weights <- rbind(weights, read.table('./weights/Model_training_chr' %&% as.character(i) %&% '_weights.txt', header = T, stringsAsFactors = F))

} weights <- rename(weights, gene = gene_id) dbWriteTable(conn, 'weights', weights, overwrite = TRUE) dbExecute(conn, "CREATE INDEX weights_rsid ON weights (rsid)") dbExecute(conn, "CREATE INDEX weights_gene ON weights (gene)") dbExecute(conn, "CREATE INDEX weights_rsid_gene ON weights (rsid, gene)")

Sample_info Table ----

sample_info <- data.frame(n_samples = n_samples, population = 'EAS') # Provide the population info dbWriteTable(conn, 'sample_info', sample_info, overwrite = TRUE)

Construction Table ----

construction <- tiss_summary %>% select(chrom, cv_seed) %>% rename(chromosome = chrom) dbWriteTable(conn, 'construction', construction, overwrite = TRUE) dbDisconnect(conn)

Fnyasimi commented 2 years ago

Hi @hust-yyan can you share the command you are using to run SPrediXcan and the full error log? Also add --verbosity 1 and --throw to log in more run info.

hust-yyan commented 2 years ago

Thanks for your quick reply! this is my command to run SPrediXcan: /home/yyan/software/MetaXcan/software/SPrediXcan.py \ --model_db_path ./input/miRNA_expression_204.db \ --covariance ./input/Model_training_covariances.txt.gz \ --gwas_folder /data1/yyan_tmp/miRNA_eqtl/TWAS/GWAS_data \ --gwas_file_pattern "${trait}.auto.rsq07.mac10.txt.gz" \ --snp_column SNPID \ --effect_allele_column Allele2 \ --non_effect_allele_column Allele1 \ --beta_column BETA \ --se_column SE \ --pvalue_column p.value \ --output_file ./results/${trait}_TAWS.csv \ --verbosity 1 \ --throw

and this is the error info image

And I didn't find the weight column in the weights file, which is the weight file generated by PredictDB. image

Fnyasimi commented 2 years ago

@hust-yyan the beta is the weight. You need the right columns in your weights table in the db.

weights <- weights %>%
       dplyr::rename(gene = gene_id, ref_allele = ref, eff_allele = alt, weight = beta)
hust-yyan commented 2 years ago

Hello, my previous problem has been solved. Now the program can run successfully.But there was a warning.He said that these three genes skipped. I checked weights file and found that weights of two genes were missing. Is this normal? image image

Fnyasimi commented 2 years ago

That's not normal. I would advice you to look back at your filtering/QC step if everything worked well.