ropensci / phylotaR

An automated pipeline for retrieving orthologous DNA sequences from GenBank in R
https://docs.ropensci.org/phylotaR
Other
23 stars 9 forks source link

Several species #31

Closed jgomezd closed 2 years ago

jgomezd commented 6 years ago

I am currently working with phylotaR for a group of species. But since I have seen in your paper, the examples and the vignette, phylotaR only Works for one txid. Is there a form of providing several txid to work with species of several groups? I have tried to supply the id of my species as a vector of characters, but it seems that it doesn´t work because the program does not advance from taxise. Best regards,

DomBennett commented 6 years ago

Hi Jorge,

Thanks very much for your question. The idea of running multiple taxa has been floated before and I have created a stub in the code to do just that. I just haven't had the time to fully implement it. Also, there may be an issue as phylota was conceived to work hierarchically with the taxonomy, it takes each taxon ID, looks for descendants and tries to find clusters within them. If you have multiple single species each, with few or no descendants, it may not work because it can't find any clusters within those descendants.

Could you send me a snippet of the code you wish to work?

Thanks, Dom

jgomezd commented 6 years ago

Hi Dom,

I see, so in the case that I am interested in only few species, do you recommend me to download the complete group where the species are? This is a snippet of the code that I would like to use:

[id_ncbi.txt](https://github.com/AntonelliLab/phylotaR/files/1929349/id_ncbi.txt)
ferns<-read.csv("~id_ncbi.txt")

wd <- "..."
ncbi_dr <- "..."
txid <- ferns$id

setUp(wd=wd, txid=txid, ncbi_dr=ncbi_dr)
run(wd=wd)
DomBennett commented 6 years ago

Hi Jorge,

I would suggest running it for a higher taxonomic group and then filtering out to just your chosen species. Although as you only have a few species, this is like taking a sledgehammer to a crack a nut. I'm envisaging a solution in my head, but it would take a bit of new development. I'll let you know when I get to it.

Thanks, Dom

jgomezd commented 6 years ago

Hi Dom,

Thanks for you advice and your support. Best,

Jorge

DomBennett commented 6 years ago

Update

@jgomezd @FranzKrah

Currently implemented a simple solution to having multiple taxonomic IDs. Avaialble via the multiple_ids branch.

Install with ....

devtools::install_github('ropensci/phylotaR', ref = 'multiple_ids')

Tested so....

ncbi_dr <- '/usr/bin/'
wd <- 'aotus_aloutta'
setup(wd = wd, txid = c('9504', '9499'), ncbi_dr = ncbi_dr, v = TRUE)
run(wd)
brunoasm commented 4 years ago

@DomBennett I am trying to use this branch, but it seems to break if there are more than 100 input NCBI IDs. Is this easy to fix?

This fails:

o_ids = c(180203, 103653, 103646, 190150, 180214, 103633, 30253, 103640, 355377, 156590, 6997, 1164880, 519494, 7007, 58602, 1034420, 1034424, 1034413, 1034430, 103655, 156592, 168705, 168706, 169091, 461299, 461303, 62746, 13551, 37639, 58560, 355390, 1034393, 294303, 270253, 7005, 243007, 58562, 1224100, 334752, 1045891, 294298, 294301, 910373, 355388, 294342, 420843, 1323534, 441217, 85156, 294354, 420844, 1007316, 656900, 58607, 1678041, 441230, 473660, 494438, 494379, 150816, 515599, 58615, 294352, 660956, 294351, 396409, 302092, 109886, 473763, 433472, 499829, 7019, 355409, 660961, 355397, 1654701, 288127, 215059, 58594, 420850, 62794, 65742, 37261, 96519, 672150, 1312898, 116142, 1045884, 122969, 355363, 355370, 1034930, 1034436, 433447, 499840, 7004, 433444, 433445, 433443, 243009, 1038060, 319231, 863395, 1661872, 431551, 57095, 396431, 795156, 274599, 227609, 1634087, 7010, 361523)

blast_dir = '/usr/local/Cellar/blast/2.9.0/bin'
work_dir = './Orthoptera'
dir.create(path = work_dir,showWarnings = F)

phylotaR::setup(wd = work_dir, 
      txid = o_ids, 
      ncbi_dr = blast_dir, 
      v = TRUE,
      overwrite = TRUE)
phylotaR::run(work_dir)

resulting in

---------------------------------------------------
Running pipeline on [unix] at [2019-12-02 10:02:14]
---------------------------------------------------
Running stages: taxise, download, cluster, cluster2
--------------------------------------------
Starting stage TAXISE: [2019-12-02 10:02:14]
--------------------------------------------
Searching taxonomic IDs ...
Downloading taxonomic records ...
. [1-100]
. [101-131]
Extra look-up for multi-ids ...
Unexpected Error in slot(object = x, name = "lng") : 
  cannot get a slot ("lng") from an object of type "NULL"

Occurred [2019-12-02 10:02:27]
Contact package maintainer for help.
Error in stages_run(wd = wd, frm = 1, to = nstages, stgs_msg = stgs_msg) : 
  Unexpected Error in slot(object = x, name = "lng") : 
  cannot get a slot ("lng") from an object of type "NULL"

Occurred [2019-12-02 10:02:27]
Contact package maintainer for help.

But if I subset o_ids to any vector with up to 100 elements, it works.

DomBennett commented 4 years ago

Hi @brunoasm

This branch has been merged with the master branch and has since been updated there.

Try installing the latest master branch and try again.

Dom

brunoasm commented 4 years ago

Thanks @DomBennett! Taxise works now.

maelle commented 2 years ago

For info, we're looking for a new maintainer / a new maintainer team for this package, see #57 and feel free to volunteer, we'd be happy to help.