roblanf / sarscov2phylo

Global phylogenies of SARS-CoV-2 sequences
GNU General Public License v3.0
86 stars 22 forks source link

File "scripts/tree_ft.sh" missing #20

Open hforoughmand opened 3 years ago

hforoughmand commented 3 years ago

Executing the pipeline cause following error: Estimating trees with bootstraps using fasttree

bash: [...]/scripts/tree_ft.sh: No such file or directory

AngieHinrichs commented 3 years ago

Hi @hforoughmand -- Rob removed the script back in October since he was no longer using it:

commit 55cbb8ddaddaceb2ecd0577f105a3fd37f13d304
Author: roblanf <rob.lanfear@gmail.com>
Date:   Thu Oct 15 15:54:41 2020 +1100

    remove unused script

    otherwise I'd have to update it (pointlessly) for the change to EPI-IDs

 scripts/tree_ft.sh | 91 -------------------------------------------------------------------------------------------
 1 file changed, 91 deletions(-)

If you have a local clone of the repository then you can get the script back by running git revert 55cbb8dda .

roblanf commented 3 years ago

@hforoughmand https://github.com/hforoughmand if you can let me know what you're trying to do, I can try to give you some advice. In it's current version the repo is set up to update existing trees.

On Sat, 29 May 2021 at 10:10, Angie Hinrichs @.***> wrote:

Hi @hforoughmand https://github.com/hforoughmand -- Rob removed the script back in October since he was no longer using it:

commit 55cbb8ddaddaceb2ecd0577f105a3fd37f13d304 Author: roblanf @.***> Date: Thu Oct 15 15:54:41 2020 +1100

remove unused script

otherwise I'd have to update it (pointlessly) for the change to EPI-IDs

scripts/tree_ft.sh | 91 ------------------------------------------------------------------------------------------- 1 file changed, 91 deletions(-)

If you have a local clone of the repository then you can get the script back by running git revert 55cbb8dda .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-850731726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2SE5ZKHI6PEFBFIAZPKLTQAWIBANCNFSM45XIA75Q .

-- Rob Lanfear Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra

www.robertlanfear.com

hforoughmand commented 3 years ago

I want to build a tree for a subset of sequences (around 10K samples). The samples are not already in the gisaid tree. And I want to do it for different subsets, e.g. for different states. Is there any way to do it automatically?

On Sat, May 29, 2021 at 5:42 AM roblanf @.***> wrote:

@hforoughmand https://github.com/hforoughmand if you can let me know what you're trying to do, I can try to give you some advice. In it's current version the repo is set up to update existing trees.

On Sat, 29 May 2021 at 10:10, Angie Hinrichs @.***> wrote:

Hi @hforoughmand https://github.com/hforoughmand -- Rob removed the script back in October since he was no longer using it:

commit 55cbb8ddaddaceb2ecd0577f105a3fd37f13d304 Author: roblanf @.***> Date: Thu Oct 15 15:54:41 2020 +1100

remove unused script

otherwise I'd have to update it (pointlessly) for the change to EPI-IDs

scripts/tree_ft.sh | 91

1 file changed, 91 deletions(-)

If you have a local clone of the repository then you can get the script back by running git revert 55cbb8dda .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-850731726 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAG2SE5ZKHI6PEFBFIAZPKLTQAWIBANCNFSM45XIA75Q

.

-- Rob Lanfear Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra

www.robertlanfear.com

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-850742851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM3IHA7WHG7A23YLJMLXPRLTQA5QRANCNFSM45XIA75Q .

roblanf commented 3 years ago

For 10K samples, assuming lots aren’t already in an existing tree, I think the best approach would be:

  1. Align as in the current repo (code should work fine)
  2. Use IQ-TREE or RAxML-NG for tree inference.

Both IQ-TREE and RAxML-NG are full ML implementations, and both scale well to 10K sequences. One thing to note is that in both you should reduce the minimum branch length to something very small, like 1e-12, to avoid odd laddering effects that result from both programs assuming a bifurcating tree.

A decent starting place for an IQ-TREE analysis might be:

iqtree2 -s aln.fasta -t NJ-R -n 0 -m GTR+R4 -nt 8 -blmin 0.00000000001

It’s probably also worth exploring some different models, though the inference will take longer. Nevertheless, the free-rate models fit the data better in my experience, something like

iqtree2 -s aln.fasta -t NJ-R -n 0 -m GTR+R4 -nt 8 -blmin 0.00000000001

The -n 0 is important - this tells IQ-TREE to do zero rounds of stochastic search - and stochastic search is not helpful on these data.

You can find similar settings in RAxML-NG. From memory there’s a setting to use a single parsimony tree as a starting tree, which is what I’d recommend here.

Rob

On Sat, 5 Jun 2021 at 7:03 pm, hforoughmand @.***> wrote:

I want to build a tree for a subset of sequences (around 10K samples). The samples are not already in the gisaid tree. And I want to do it for different subsets, e.g. for different states. Is there any way to do it automatically?

On Sat, May 29, 2021 at 5:42 AM roblanf @.***> wrote:

@hforoughmand https://github.com/hforoughmand if you can let me know what you're trying to do, I can try to give you some advice. In it's current version the repo is set up to update existing trees.

On Sat, 29 May 2021 at 10:10, Angie Hinrichs @.***> wrote:

Hi @hforoughmand https://github.com/hforoughmand -- Rob removed the script back in October since he was no longer using it:

commit 55cbb8ddaddaceb2ecd0577f105a3fd37f13d304 Author: roblanf @.***> Date: Thu Oct 15 15:54:41 2020 +1100

remove unused script

otherwise I'd have to update it (pointlessly) for the change to EPI-IDs

scripts/tree_ft.sh | 91


1 file changed, 91 deletions(-)

If you have a local clone of the repository then you can get the script back by running git revert 55cbb8dda .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <

https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-850731726

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAG2SE5ZKHI6PEFBFIAZPKLTQAWIBANCNFSM45XIA75Q

.

-- Rob Lanfear Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra

www.robertlanfear.com

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-850742851 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AM3IHA7WHG7A23YLJMLXPRLTQA5QRANCNFSM45XIA75Q

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-855209391, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2SE76YILYNKVRYYZBFA3TRHR6NANCNFSM45XIA75Q .

-- Rob Lanfear Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra

www.robertlanfear.com

hforoughmand commented 3 years ago

Thank you for your detailed reply.

These are the steps for creating the starting tree. And then we can use your tools to add the rest of samples to the tree? I mean this repo contains scripts producing the final tree, so what are the steps for that?

Thanks again

roblanf commented 3 years ago

If you have samples after the initial 10K, then this repo should work fine.

One thing you should consider is using UShER to place samples instead of IQ-TREE. Both work fine, it's just that at some point (perhaps around 100K samples) IQ-TREE starts to use really a lot of memory, and may be impractical. You can replace the sample-placement step in this repo with similar code to use UShER, which should work pretty fast, and will certainly use vastly less memory.

Rob

On Thu, 17 Jun 2021 at 23:28, Hadi @.***> wrote:

Thank you for your detailed reply.

These are the steps for creating the starting tree. And then we can use your tools to add the rest of samples to the tree? I mean this repo contains scripts producing the final tree, so what are the steps for that?

Thanks again

For 10K samples, assuming lots aren’t already in an existing tree, I think the best approach would be: 1. Align as in the current repo (code should work fine) 2. Use IQ-TREE or RAxML-NG for tree inference. Both IQ-TREE and RAxML-NG are full ML implementations, and both scale well to 10K sequences. One thing to note is that in both you should reduce the minimum branch length to something very small, like 1e-12, to avoid odd laddering effects that result from both programs assuming a bifurcating tree. A decent starting place for an IQ-TREE analysis might be: iqtree2 -s aln.fasta -t NJ-R -n 0 -m GTR+R4 -nt 8 -blmin 0.00000000001 It’s probably also worth exploring some different models, though the inference will take longer. Nevertheless, the free-rate models fit the data better in my experience, something like iqtree2 -s aln.fasta -t NJ-R -n 0 -m GTR+R4 -nt 8 -blmin 0.00000000001 The -n 0 is important - this tells IQ-TREE to do zero rounds of stochastic search - and stochastic search is not helpful on these data. You can find similar settings in RAxML-NG. From memory there’s a setting to use a single parsimony tree as a starting tree, which is what I’d recommend here. Rob On Sat, 5 Jun 2021 at 7:03 pm, hforoughmand @. > wrote: I want to build a tree for a subset of sequences (around 10K samples). The samples are not already in the gisaid tree. And I want to do it for different subsets, e.g. for different states. Is there any way to do it automatically? On Sat, May 29, 2021 at 5:42 AM roblanf @.> wrote: > @hforoughmand https://github.com/hforoughmand > https://github.com/hforoughmand > if you can let me know what you're trying to do, I can try to give you some > advice. In it's current version the repo is set up to update existing > trees. > > On Sat, 29 May 2021 at 10:10, Angie Hinrichs @.*> > wrote: > > > Hi @hforoughmand https://github.com/hforoughmand https://github.com/hforoughmand https://github.com/hforoughmand -- Rob removed the > > script back in October since he was no longer using it: > > > > commit 55cbb8d https://github.com/roblanf/sarscov2phylo/commit/55cbb8ddaddaceb2ecd0577f105a3fd37f13d304

Author: roblanf @.*> > > Date: Thu Oct 15 15:54:41 2020 +1100 > > > > remove unused script > > > > otherwise I'd have to update it (pointlessly) for the change to EPI-IDs > > > > scripts/tree_ft.sh | 91 >

1 file changed, 91 deletions(-) > > > > If you have a local clone of the repository then you can get the script > > back by running git revert 55cbb8d https://github.com/roblanf/sarscov2phylo/commit/55cbb8ddaddaceb2ecd0577f105a3fd37f13d304 . > > > > — > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub > > < > #20 (comment) https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-850731726 , > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AAG2SE5ZKHI6PEFBFIAZPKLTQAWIBANCNFSM45XIA75Q

. > > > > > -- > Rob Lanfear > Division of Ecology and Evolution, > Research School of Biology, > The Australian National University, > Canberra > > www.robertlanfear.com > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #20 (comment) https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-850742851 , > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AM3IHA7WHG7A23YLJMLXPRLTQA5QRANCNFSM45XIA75Q . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20 (comment) https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-855209391>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2SE76YILYNKVRYYZBFA3TRHR6NANCNFSM45XIA75Q . -- Rob Lanfear Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra www.robertlanfear.com

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/roblanf/sarscov2phylo/issues/20#issuecomment-863239844, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG2SE22IMMHCW2F7BXN2XDTTH2BTANCNFSM45XIA75Q .

-- Rob Lanfear Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra

www.robertlanfear.com