simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

VIRSorter_global_phage_signal.csv is not machine readable #37

Open rec3141 opened 5 years ago

rec3141 commented 5 years ago

Hi, thanks for your program, it's very useful but a small change that would make it more so would be to make the output file easily machine readable (e.g. directly importable into R or python). Right now the comments and headers are interspersed but there's really no need for that. You also have a "Category" column with non-distinct numbers (e.g. "Phage Category 1" and "Prophage Category 1" get the same entry). You also mangle the FASTA headers from the original file (e.g. replacing '.' with '_') which makes it unnecessarily more difficult to match up to the original data files. thanks

rec3141 commented 5 years ago

in case anyone else is working in R, here's how I imported it


vs.pred <- read.csv(virsorterfile,quote="",head=F)
vs.head <- read.table(virsorterfile,sep=",",quote="",head=T,comment="",skip=1,nrows=1)
colnames(vs.pred) <- colnames(vs.head)
colnames(vs.pred)[1] <- "vs.id"
vs.cats <- do.call(rbind,strsplit(x=as.character(vs.pred$vs.id[grep("category",vs.pred$vs.id)]),split=" - ",fixed=T))[,2]
vs.num <- grep("category",vs.pred$vs.id)
vs.pred$Category <- paste(c("",rep.int(vs.cats, c(vs.num[-1],nrow(vs.pred)) - vs.num)), vs.pred$Category)
vs.pred <- vs.pred[-grep("#",vs.pred$vs.id),]

vs.pred$node <- gsub(pattern="VIRSorter_",replacement="",x=vs.pred$vs.id)
vs.pred$node <- gsub(pattern="-circular",replacement="",x=vs.pred$node)
vs.pred$node <- gsub(pattern="cov_(\\d+)_",replacement="cov_\\1.",x=vs.pred$node,perl=F)
simroux commented 5 years ago

Hi,

Thanks for the suggestion, and thanks a lot for sharing the R Code to import VirSorter results. Unfortunately, there is no support for VirSorter development anymore, so I can't commit on any timeframe by which these different issues may be fixed, but I have linked the R code in the Readme to help any user which would like to do the same type of import.

Best, Simon