jessieren / VirFinder

VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data
Other
130 stars 24 forks source link

problem with smooth.spline and VF.qvalue #29

Closed drelo closed 2 years ago

drelo commented 2 years ago

Dear Jessie,

I wonder if you could help me solving this. Some months ago I run VirFinder with no issues I used the script below to make a loop over several .fas files and obtaining one tsv for each 'sample' with the results. Now I am analyzing extra data and I found this error

Error in smooth.spline(lambda, pi0, df = smooth.df) :
  missing or infinite values in inputs are not allowed
Calls: %>% ... <Anonymous> -> VF.qvalue -> qvalue -> pi0est -> smooth.spline

Maybe this is similar to this issue

Maybe the routine didn't find enough virus among the data and it fails. I need to incorporate these results to the already calculated ones for close to 300 samples so I wonder how can I fix this. Thanks for your help.

library(VirFinder)
library(purrr)
library(fs)
library(readr)
library(dplyr)
library(stringr)
library(tibble)
library(tidyr)

# read all *.fas
my_files <- list.files( pattern = "*fas")
neonames <- my_files %>% str_replace(".fas", "")
qnames <- my_files %>% str_replace(".fas", "Q")

bigPred <- map(my_files, VF.pred) %>% setNames(neonames)
saveRDS(bigPred, "bigPred.RDS")

#########  #############  ######  #######  #####  #############
# bind as a single tibble
bigPRED <- bind_rows(bigPred, .id = "sample") %>% relocate(sample, .after = last_col())
# get Q value from bigPRED
bigQ <- bigPRED %>% group_by(sample) %>%
        group_map(~ VF.qvalue(.x$pvalue)) %>% setNames(neonames)
# bigQ list of dataframes into a single tibble
Qvalue <- bigQ %>% enframe() %>%
  unnest() %>% rename(sample = name, qvalue = value)

# send to environment
# Qvalue %>%  group_split(sample) %>% set_names(qnames) %>% 
#        map(., tibble::as_tibble) %>% list2env(., envir = .GlobalEnv)
# bigPRED %>%  group_split(sample) %>% set_names(neonames) %>% 
#        map(., tibble::as_tibble) %>% list2env(., envir = .GlobalEnv)

# just a col bind both PRED and qvalue
FULL <- bind_cols(bigPRED, Qvalue) %>%
  rename(sample = sample...5) %>% select(-sample...6) %>%  # rename
  group_by(sample) %>% arrange(desc(score), .by_group = TRUE ) %>%  # score
  group_by(sample) %>% filter(score > 0.9)                          # greater than 0.9  

# nest by sample, export each one
FULL %>% nest(-sample) %>%
  pwalk(~write_tsv(x = .y, file = paste0(., ".tsv")))
# rm(list=setdiff(ls(), "bigPred")) # delete everything sans bigPred
drelo commented 2 years ago

I fixed adding this to the code. hope it helps

hits <- as.character((distinct(bigPRED, sample))[,1])
bigQ <- bigPRED %>% group_by(sample) %>% 
  group_map(~ VF.qvalue(.x$pvalue)) %>% setNames(hits)
rbtoscan commented 1 year ago

Hi Drelo,

could you please share the final version of you script? Thanks!