mhahsler / rBLAST

Interface for the Basic Local Alignment Search Tool (BLAST) - R-Package
GNU General Public License v3.0
103 stars 22 forks source link

function predict() doesn't finish #35

Open SilSanGon opened 3 months ago

SilSanGon commented 3 months ago

Hello!

I want to analyse my samples against 16S NCBI database, which I downloaded previously, but when I tried the last step, function predict(), its takes 5 days without results. Do you know what is the cause? or if I did something wrong?

I copy my code in:

start blast steps

seq <- readDNAStringSet("All_combined_seqs.fasta")

Make BLAST db and perform BLAST search

makeblastdb(path_to_seqs4_BLAST_db, dbtype = "nucl") dbb <- blast(db=path_to_seqs4_BLAST_db)

tgz_file <- blast_db_get("16S_ribosomal_RNA.tar.gz") untar(tgz_file, exdir = "16S_rRNA_DB")

Load the downloaded BLAST database.

bl <- blast(db = "./16S_rRNA_DB/16S_ribosomal_RNA") bl

change parameters here as required

results = predict(bl, seq, BLAST_args= c("-perc_identity 99")) write.csv(results, "blast_results.csv", row.names = FALSE)

My computer is Ubuntu 22.04.2, AMD Ryzen threadripper 1920x 12-core processor x24, 500GB of memory. Thank you very much in advance!

mhahsler commented 3 months ago

Hi,

  1. Please post the output of running sessionInfo() and system2("blastn", "-version") after all used packages are loaded.

  2. Can you run the example in the man page successfully?

seq <- readRNAStringSet(system.file("examples/RNA_example.fasta",
       package = "rBLAST"))[1]
seq

cl <- predict(bl, seq)
cl[1:5, ]
  1. If the example works, then I will need your sequences and the code that reprodices the issue.

Regards, Michael

SilSanGon commented 3 months ago

Hi,

  1. Okey, I post the output session_info system2

  2. Yes, I can run the example in the man page successfully, I post too image

  3. I could do it, but it is heavy. How can I send it?

Maybe, I think, is it possible that the 16S_rRNA_DB database is not compatible with the readDNAStringSet() function? I'm testing with this function and not with readRNAStringSet()

Thank you very much! Silvia

mhahsler commented 3 months ago

OK, let me know.

SilSanGon commented 3 months ago

Hi Michael,

I was looking for a solution and I found that it wouldn't be a problem use 16S_rRNA database with readDNAStringSet() function in this post (https://bioinformatics.stackexchange.com/questions/4015/is-it-ok-to-use-blast-to-query-ncbis-16s-rrna-database-with-16s-dna-sequences) so what do you think it could be the problem?

I have tried to attach the script with a sample and it is impossible because it weighs more than 25Mb. Do you know other way?

mhahsler commented 3 months ago

Can you put the data and the script on google drive and share it with me? mhahsler@gmail.com