Closed ofleitas closed 9 months ago
Of course. Assume you have the FASTA files for both sets of proteins under two directories. You can use a loop to read all the FASTA files from each directory and store the sequences in two lists.
The example function below assumes each FASTA file contains only one protein sequence, or the sequence to read is the first one present in the file. You can modify this code to make it fit your use case. The list.files()
function is used to get the names of all files in a directory, then read each one using readFASTA()
.
library(protr)
read_prot_dir <- function(path) {
fasta_files <- list.files(path, pattern = "\\.fasta$", full.names = TRUE)
lst <- list()
for (file in fasta_files) lst[[basename(file)]] <- readFASTA(file)[[1]]
lst
}
path1 <- system.file("protseq/", package = "protr")
path2 <- system.file("protseq/", package = "protr")
plist1 <- read_prot_dir("path/to/set1")
plist2 <- read_prot_dir("path/to/set2")
crossSetSim(plist1, plist2)
Thank you very much. It worked for me. You said that the function assumes each FASTA file contains only one protein sequence, or the sequence to read is the first one present in the file. But if all the proteins of the set are in a single fasta file?
Great. readFASTA()
will try to parse all sequences from the FASTA file into a list. Then you can extract the sequences for each set from the list by index and create two separate lists. For example:
library(protr)
path <- system.file("protseq/mitochondrion.fasta", package = "protr")
plist <- readFASTA(path)
plist1 <- plist[11:20]
plist2 <- plist[21:30]
sim <- crossSetSim(plist1, plist2)
If they are already in two FASTA files:
library(protr)
path1 <- system.file("protseq/extracell.fasta", package = "protr")
path2 <- system.file("protseq/mitochondrion.fasta", package = "protr")
# Only use 10 proteins from both sets for demonstration
plist1 <- readFASTA(path1)[11:20]
plist2 <- readFASTA(path2)[11:20]
sim <- crossSetSim(plist1, plist2)
Perfect!!! It worked for me. Thank you very much for your help!!!
Hello
I want to use protr to compare two set of proteins with the function crossSetSim(). Each set contains a lot of proteins, so is difficult for me create the protlist1 and protlist2 by entering one by one the proteins as in the example available at https://cran.r-project.org/web/packages/protr/protr.pdf s1 <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] s2 <- readFASTA(system.file("protseq/P08218.fasta", package = "protr"))[[1]] s3 <- readFASTA(system.file("protseq/P10323.fasta", package = "protr"))[[1]] s4 <- readFASTA(system.file("protseq/P20160.fasta", package = "protr"))[[1]] s5 <- readFASTA(system.file("protseq/Q9NZP8.fasta", package = "protr"))[[1]]
Is there a way to do this without having to enter the proteins one by one?