larssnip / micropan

R package for microbial pangenomics
21 stars 0 forks source link

Issue with blastpallall #14

Closed pattyjk closed 1 year ago

pattyjk commented 1 year ago

hello:

I was having an issue with the blastpallall step. I have twenty genomes I'm working with. I've reformatted them panprep and blastpallall will run, but I get NA files and I don't get all the comparison files I'd expect. For instance:

blastpAllAll: Making BLAST database of C:/Users/patty/Downloads/PFC_aggregated_prot/faa/GID628C_GID628C.faa GID628_vs_GID628.txt GID629_vs_GID628.txt GID630_vs_GID628.txt GID631_vs_GID628.txt GID632_vs_GID628.txt GID633_vs_GID628.txt GID638_vs_GID628.txt GID643_vs_GID628.txt GID647_vs_GID628.txt GID649_vs_GID628.txt NA

My code:

`#micropan pangenome library(tidyverse) library(micropan)

load genome table

gnm.tbl <- read.delim("C:/Users/patty/Downloads/PFC_genome_table.txt")

setwd

setwd("C:/Users/patty/Downloads/PFC_aggregated_prot/")

create new folder for BLAST results and faa files

dir.create("blast") dir.create('faa')

prep files for analysis

this takes the name of the genome and adds them to every sequence and then adds a sequence name to each (and to file name)

for(i in 1:nrow(gnm.tbl)){ panPrep(file.path("C:/Users/patty/Downloads/PFC_aggregated_prot/", str_c(gnm.tbl$File[i], ".faa")), gnm.tbl$genome_id[i], file.path("faa", str_c(gnm.tbl$genome_id[i], ".faa"))) }

read in protein files and BLASTp them

faa.files<-list.files("C:/Users/patty/Downloads/PFC_aggregated_prot/faa", pattern = "\.faa$", full.names = T) blastpAllAll(faa.files, out.folder = "blast", verbose=T, threads=2)`

Any thoughts?