Closed ngeraldi closed 5 years ago
Hi Nathan, Thanks for the post. It looks like that fasta file contains no sequence information - only the header lines. Your code in the database curation file looks fine to me, so I'm not sure why the sequences have been omitted. Did the virtualPCR function run okay (i.e. did it print something like "retained 70960 sequences after reverse trim")? Cheers, Shaun
Hi Shaun,
thanks for the very quick reply and sorry for not checking the basics.
I actually lost the sequences when removing those with taxID not in db (using row.names to add column to index list of sequences are characters not numbers:( ).
Running now, thanks for this package!
Will let you know how it goes.
I just save output of learn() with writeRDS() to use in classify?
Best Nathan
Hi Nathan,
No problem at all, glad you got it working. Yes the classifier can be quite slow to run so it would pay to run either saveRDS(euka02_learn, "euka02_learn.rds")
or save(euka02_learn, "euka02_learn.RData")
to back up the tree once it has finished. For the classify
function the tree just needs to be loaded in memory, so you can use either readRDS
or load
depending on how you saved it (I would generally recommend using saveRDS
and readRDS
since the files are smaller and load faster).
Cheers,
Shaun
I am trying to run the learn function with a custom 18s database (Silva.v132 trimmed with Euka02 primer, from Taberlet et al 2018). I have put the reference data, the error and R version info below.
More info on database creation is on my github page https://github.com/ngeraldi/insect-pipeline/blob/master/insect_ref_lib_18s.R
Any help would be much appreciated and let me know if you need any additional info. Thank you Nathan nathan.geraldi@kaust.edu.sa
Database is here : https://www.dropbox.com/s/q0rm6rcs44fa9h2/SILVA_132_SSURef_Nr99_tax_silva_trunc_DNA_insect_trim_and_format_euka02.fasta?dl=0
Code, error, version: x<-insect::readFASTA(file ="SILVA_132_SSURef_Nr99_tax_silva_trunc_DNA_insect_trim_and_format_euka02.fasta") db<-insect::taxonomy(db = "NCBI", synonyms = F) # download ncbi taxonomy database
euka02_learn<- learn(x, db, model = NULL, refine = "Viterbi", iterations = 50, nstart = 20, minK = 2, maxK = 2, minscore = 0.9, probs = 0.5, retry = TRUE, resize = TRUE, maxsize = max(sapply(x, length)), recursive = TRUE, cores = "autodetect", quiet = F)
R console output: Training classifier Converting taxon IDs to full lineage strings Initializing tree object Deriving sequence weights Making hash key for exact sequence matching Dereplicating sequences Deriving top level model Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent
version() platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.3
year 2017
month 11
day 30
svn rev 73796
language R
version.string R version 3.4.3 (2017-11-30) nickname Kite-Eating Tree