wejlab / MetaScope

An R-based approach for preprocessing and aligning 16S, metagenomic, and metatranscriptomic data (PathoScope version 3.0)
GNU General Public License v3.0
16 stars 7 forks source link

Using NCBI taxid as parameter in download_refseq #34

Open pedres opened 1 week ago

pedres commented 1 week ago

Hi, I am just starting to use your package. I have list of bacteria I would like to look for in my samples. However, I did a test with a small test following the tutorial and found that the sapply function did not worked. I found that it was due to "Salmonella enterica subsp. enterica serovar Typhi". I think that I have "curated" the list retaining only the bacteria with NCBI taxid and correcting its taxonomy. Curiously, this bacteria has a NCBI taxid (90370). Are you considering to use NCBI taxids as parameter to download_refseq function? In fact, in https://www.ncbi.nlm.nih.gov/datasets/taxonomy/90370/ appear a lot of genomes of this bacteria. Thank you very much Manuel tax_reportOK.zip

seanlu96 commented 1 week ago

Hi Manuel, Could you provide the errors that you're receiving with your tests? I took a look at the vignette for the sapply(all_species, download_refseq) section think you are referencing with that taxon (Salmonella enterica subsp. enterica serovar Typhi), and I didn't run into any issues Thanks! Sean

pedres commented 1 week ago

Hi Sean, If a run bacteria<-readRDS("tax_reportOK.rds") somePATHOs<-c(562,1280,573,1313,470,287,1773,1352,547,1319,36470, 28901, 90370,1351,583,613,544,727,620,590,54388,57045,57046,581)

somePATHOs<-bacteria%>%filter(taxid %in% somePATHOs)%>%dplyr::select(tax)%>%dplyr::pull(tax)

sapply(somePATHOs, download_refseq, reference = FALSE, representative = TRUE, compress = TRUE, out_dir = target_ref_temp, caching = TRUE)

I get: Error in FUN(X[[i]], ...) : No rank detected If I remove "Salmonella enterica subsp. enterica serovar Typhi" with somePATHOs[2:12] the function works well sapply(somePATHOs[2:12], download_refseq, reference = FALSE, representative = TRUE, compress = TRUE, out_dir = target_ref_temp, caching = TRUE)

pedres commented 1 week ago

Hi I also have the same error running

sapply("Salmonella enterica subsp. enterica serovar Typhi", download_refseq, reference = FALSE, representative = TRUE, compress = TRUE, out_dir = target_ref_temp, caching = TRUE)

sapply("Salmonella enterica subsp. enterica serovar Typhi", download_refseq, reference = FALSE, representative = FALSE, compress = TRUE, out_dir = target_ref_temp, caching = TRUE)

Another question. In the tutorial download_refseq run with reference=FALSE and representative=FALSE, but the default parameters are set to reference=TRUE and representative=FALSE. Are default options for these parameters the best option? Thank you very much for your help Manuel