tgvaughan / MultiTypeTree

BEAST 2 package which provides support for multi-type trees: phylogenetic trees on structured populations.
http://tgvaughan.github.io/MultiTypeTree
GNU General Public License v3.0
21 stars 16 forks source link

Run MultitypeTree on a bacterial phylogeny #19

Closed StephanieWLo closed 3 years ago

StephanieWLo commented 3 years ago

Hello Tim,

I have tried to run multitype tree on a bacterial phylogeny and the attached is the input .xml file. When start running,I got the error below? Could you please help? I will also appreciate you could comment on the parameters that I set in the .xml. Thank you very much in advance.

Best wishes,

Stephanie

GPSC10_sero24_multitypetree.xml.zip

Random number seed: 1604358566617

File: GPSC10_sero24_multitypetree.xml seed: 1604358566617 threads: 32 Validation error when initializing object beast.evolution.tree.SCMigrationModel (id migModelInit.t:GPSC10_recomb_free_date_loc_snp): java.lang.IllegalArgumentException: Input 'typeSet' must be specified. at beast.core.Input.validate(Unknown Source) at beast.core.BEASTInterface.validateInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseRunElement(Unknown Source) at beast.util.XMLParser.parse(Unknown Source) at beast.util.XMLParser.parseFile(Unknown Source) at beast.app.BeastMCMC.parseArgs(Unknown Source) at beast.app.beastapp.BeastMain.main(Unknown Source)

Error 110 parsing the xml input file

validate and intialize error: Input 'typeSet' must be specified.

Error detected about here:

tgvaughan commented 3 years ago

Hi Stephanie, how did you create this XML file? It seems to be broken in a couple of places. (I tried replacing the missing typeSet input, but then I encounter an error because the migration matrix has the wrong dimension.) If you created it in BEAUti, it's possible that something went wrong during the analysis set-up process, resulting in a corrupted analysis file. To try to rule this out as a possibility, I would try to again set things up in BEAUti.

StephanieWLo commented 3 years ago

Hi Tim,

Thank you for your reply. I have re-created the .xml in Beauti using MultiTypeTree template (attached).

In brief, below are my settings: Site model -> proportion Invariant -> GTR Clock model -> Relax clock log normal and clock rate =0.1 Prior -> structured.coalescent -> check estimates -> set uniform for pop.size and rate.matrix -> lower =0.1 and upper = 3.0 (I will appreciate your input for this prior, a longer run time shouldn’t be a problem) Prior -> ucld.mean -> lower=0.001, upper =1000, value = 0.1, distribution =Gamma MCMC -> chain length = 10 million and log every 10000

Thank you very much for your help and look forward to hearing from you.

Best wishes,

Stephanie Lo, PhD Project Manager / Principal Bioinformatician Genomics of Pneumonia and Meningitis Parasites and Microbes

Wellcome Sanger Institute Wellcome Genome Campus, Hinxton Cambridge, CB10 1SA

Email: stephanie.lo@sanger.ac.ukmailto:stephanie.lo@sanger.ac.uk Web: Bentley grouphttps://bentleygroup.sanger.ac.uk/ | GPShttps://www.pneumogen.net/gps/ | Traininghttps://training.bactgen.sanger.ac.uk/#/ | My profilehttps://www.sanger.ac.uk/person/lo-stephanie/

On 3 Nov 2020, at 16:28, Tim Vaughan notifications@github.com<mailto:notifications@github.com> wrote:

Hi Stephanie, how did you create this XML file? It seems to be broken in a couple of places. (I tried replacing the missing typeSet input, but then I encounter an error because the migration matrix has the wrong dimension.) If you created it in BEAUti, it's possible that something went wrong during the analysis set-up process, resulting in a corrupted analysis file. To try to rule this out as a possibility, I would try to again set things up in BEAUti.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tgvaughan_MultiTypeTree_issues_19-23issuecomment-2D721235486&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=jAs3KUYJfGT4gKduOgRvANbhRl4cjwAUElNpGyLYOVg&m=2nHrsNM5MdFFpQY6eLUxmMXoVvQfrqRrLeW6ft9dkmU&s=rH5j997r6oMdPD4D4t-KJS4Gbsl0Q3uklYXLg0y8Rfw&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHA5NQLXMNP2UHBO22MH75DSOAVR5ANCNFSM4TIB372A&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=jAs3KUYJfGT4gKduOgRvANbhRl4cjwAUElNpGyLYOVg&m=2nHrsNM5MdFFpQY6eLUxmMXoVvQfrqRrLeW6ft9dkmU&s=EpI9piqgKxwKCXQvAMIV1ApAma1EhNUp2Oh_n9NzzP8&e=.

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

StephanieWLo commented 3 years ago

Hi Tim,

I would like to touch base with you about preparing the .xml for running multitypetree. I am still having trouble to get it running and the attached is the error. Look forward to hearing from you.

Best wishes,

Steph

On 5 Nov 2020, at 10:03, Stephanie Lo sl28@sanger.ac.uk<mailto:sl28@sanger.ac.uk> wrote:

Hi Tim,

Thank you for your reply. I have re-created the .xml in Beauti using MultiTypeTree template (attached).

In brief, below are my settings: Site model -> proportion Invariant -> GTR Clock model -> Relax clock log normal and clock rate =0.1 Prior -> structured.coalescent -> check estimates -> set uniform for pop.size and rate.matrix -> lower =0.1 and upper = 3.0 (I will appreciate your input for this prior, a longer run time shouldn’t be a problem) Prior -> ucld.mean -> lower=0.001, upper =1000, value = 0.1, distribution =Gamma MCMC -> chain length = 10 million and log every 10000

Thank you very much for your help and look forward to hearing from you.

Best wishes,

Stephanie Lo, PhD Project Manager / Principal Bioinformatician Genomics of Pneumonia and Meningitis Parasites and Microbes

Wellcome Sanger Institute Wellcome Genome Campus, Hinxton Cambridge, CB10 1SA

Email: stephanie.lo@sanger.ac.ukmailto:stephanie.lo@sanger.ac.uk Web: Bentley grouphttps://bentleygroup.sanger.ac.uk/ | GPShttps://www.pneumogen.net/gps/ | Traininghttps://training.bactgen.sanger.ac.uk/#/ | My profilehttps://www.sanger.ac.uk/person/lo-stephanie/

On 3 Nov 2020, at 16:28, Tim Vaughan notifications@github.com<mailto:notifications@github.com> wrote:

Hi Stephanie, how did you create this XML file? It seems to be broken in a couple of places. (I tried replacing the missing typeSet input, but then I encounter an error because the migration matrix has the wrong dimension.) If you created it in BEAUti, it's possible that something went wrong during the analysis set-up process, resulting in a corrupted analysis file. To try to rule this out as a possibility, I would try to again set things up in BEAUti.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tgvaughan_MultiTypeTree_issues_19-23issuecomment-2D721235486&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=jAs3KUYJfGT4gKduOgRvANbhRl4cjwAUElNpGyLYOVg&m=2nHrsNM5MdFFpQY6eLUxmMXoVvQfrqRrLeW6ft9dkmU&s=rH5j997r6oMdPD4D4t-KJS4Gbsl0Q3uklYXLg0y8Rfw&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHA5NQLXMNP2UHBO22MH75DSOAVR5ANCNFSM4TIB372A&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=jAs3KUYJfGT4gKduOgRvANbhRl4cjwAUElNpGyLYOVg&m=2nHrsNM5MdFFpQY6eLUxmMXoVvQfrqRrLeW6ft9dkmU&s=EpI9piqgKxwKCXQvAMIV1ApAma1EhNUp2Oh_n9NzzP8&e=.

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. Random number seed: 1605021601118 File: GPSC10_sero24_multitypetree_5Nov2020.xml seed: 1605021601118 threads: 32 Validation error when initializing object beast.evolution.tree.SCMigrationModel (id migModelInit.t:GPSC10_recomb_free_date_loc_snp): java.lang.IllegalArgumentException: Input 'typeSet' must be specified. at beast.core.Input.validate(Unknown Source) at beast.core.BEASTInterface.validateInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseRunElement(Unknown Source) at beast.util.XMLParser.parse(Unknown Source) at beast.util.XMLParser.parseFile(Unknown Source) at beast.app.BeastMCMC.parseArgs(Unknown Source) at beast.app.beastapp.BeastMain.main(Unknown Source) Error 110 parsing the xml input file validate and intialize error: Input 'typeSet' must be specified. Error detected about here:
tgvaughan commented 3 years ago

Hi @StephanieWLo, I think you're trying to attach files to this issue via email - I don't think this is possible (I can't see your attachments anyway). Would you mind posting your response via github and adding the attachments directly? Thanks.

StephanieWLo commented 3 years ago

Hi @tgvaughan,

Sure and thank you. I ran the attached .xml on BEAST 2.6.2. and MultiTypeTree v7.0.1. The error messages are below and will appreciate your advice on how to address it. Thank you.

Validate and intialize error: Input 'typeSet' must be specified.

Error detected about here: `

` Validation error when initializing object beast.evolution.tree.SCMigrationModel (id migModelInit.t:GPSC10_recomb_free_date_loc_snp): java.lang.IllegalArgumentException: Input 'typeSet' must be specified. at beast.core.Input.validate(Unknown Source) at beast.core.BEASTInterface.validateInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseInputs(Unknown Source) at beast.util.XMLParser.createObject(Unknown Source) at beast.util.XMLParser.parseRunElement(Unknown Source) at beast.util.XMLParser.parse(Unknown Source) at beast.util.XMLParser.parseFile(Unknown Source) at beast.app.BeastMCMC.parseArgs(Unknown Source) at beast.app.beastapp.BeastMain.main(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at beast.app.beastapp.BeastLauncher.run(Unknown Source) at beast.app.beastapp.BeastLauncher.main(Unknown Source) [GPSC10_sero24_multitypetree_10Nov2020.xml.zip](https://github.com/tgvaughan/MultiTypeTree/files/5535985/GPSC10_sero24_multitypetree_10Nov2020.xml.zip)
tgvaughan commented 3 years ago

Hi @StephanieWLo, I extracted the sequence data from your XML and created a separate fasta file, then loaded this into BEAUti with the MultiTypeTree template. I then set up the tip dates and the tip locations, set the population sizes and migration rates to be estimated, then saved a new analysis XML file. I didn't experience any problems running this analysis in BEAST.

One thing that might be causing trouble is that that your sequence IDs don't seem particularly "clean". For instance, the field indicating location of the sample has some mixed capitalisations: e.g. INDIA and India. This has meant that 2 demes are created for this one location. Furthermore, it's probably better to just use a single character such as _ to separate fields: your input has in one place _ and in another __. This makes it harder for BEAUti to automatically split these things up.

I really can't offer much help regarding prior choices - my best suggestion here would be to read the paper (or other structured coalescent material) to make sure you understand the model parameters, then use whatever expert knowledge you have to guide your choice of parameter priors.

Another comment - you have a large number of demes here. Without doing something special (such as using some fancy hierarchical prior on the migration rate matrix) you are probably going to have troubles getting this analysis to run to completion. With this in mind, it might be an idea to look at using one of the approximate structured coalescent methods, such as BASTA or MASCOT. The latter is very well supported by BEAST 2, and there are tutorials online at https://taming-the-beast.org.

Hope this helps!