katylettuce / beast-mcmc

Automatically exported from code.google.com/p/beast-mcmc
0 stars 0 forks source link

Using Microsatellite loci for inferring one phylogenetic tree that uses all the loci #677

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

I'm trying to use BEAST for phylogenetic tree reconstruction using 
microsatellite loci. Specifically, I have X taxa and Y microsatellite loci but 
some data is missing so each locus has signal in different sub-groups of the 
taxa (an example for such data is the example file microsatellite_data.txt 
located under examples\Data). 
I want to reconstruct a single tree which takes into account all the loci. Is 
it possible to perform such a task using BEAUti and BEAST? When I try to use 
the example file it produces separated trees for each locus. Is it possible to 
combine all the trees into one tree that uses all the loci?

Thank you,
Noa

Original issue reported on code.google.com by dong.w.xie@gmail.com on 21 Feb 2013 at 10:57

GoogleCodeExporter commented 9 years ago
Hi Walter,

Could you please help us with this? For msat data it creates separate taxa 
objects for different loci even they share the same set of ids in the input 
data file. The reason for this, in short, is because we want to prevent the 
situation of empty tips (tips that only has empty values associated to it).

We now have a situation where the user would like to link the trees for 
multiple loci. However, because different trees have different taxa objects, 
BEAUti won't let you link the trees. Also different taxa objects may have 
different sets of taxon members due to missing values.

Just a quick recap on the xml setup ...

For example for the following data set:

Id         locus1    locus2  
taxon1         10        12
taxon2          ?         10
taxon3         15          ?
taxon4          ?          8

If the tree of loci1 and 2 are unlinked, then Beauti creates two taxa blocks in 
the xml file:

<taxa id="locus1.taxa">
  <taxon idref="locus2_taxon1"/>
  <taxon idref="locus2_taxon3"/>
</taxa>

<taxa id="locus2.taxa">
  <taxon idref="locus2_taxon1"/>
  <taxon idref="locus2_taxon2"/>
  <taxon idref="locus2_taxon4"/>
</taxa>

Pretend that the microsatellite object has max = 23, min = 1 and unitLength = 
1, the microsatellitePattern blocks look like:

<microsatellitePattern id="locus1">
  <taxa idref="locus1.taxa"/>
  <microsatellite id="test.microsat" max="23" min="1" unitLength="1"/>
  <microsatSeq>
    10,15
  </microsatSeq>
</microsatellitePattern>

<microsatellitePattern id="locus2">
  <taxa idref="Locus2.taxa"/>
  <microsatellite idref="test.microsat"/>
  <microsatSeq>
    12,10,8
  </microsatSeq>
</microsatellitePattern>

Notice that there no missing values in the microsatellite pattern blocks (for 
detailed reasons please see the forwarded email below).

Now I want to link the trees of loci1 and 2 together, then this is what the xml 
should look like:

<taxa id="locus1_2.taxa">
  <taxon idref="locus1_2_taxon1"/>
  <taxon idref="locus1_2_taxon2"/>
  <taxon idref="locus1_2_taxon3"/>
  <taxon idref="locus1_2_taxon4"/>
</taxa>

The taxa above should be the *union* of the two taxon sets {taxon1, taxon3} and 
{taxon1,taxon2,taxon4}. 

The microsatellitePattern block would look like:

<microsatellitePattern id="locus1">
  <taxa idref="locus1.taxa"/>
  <microsatellite id="test.microsat" max="23" min="1" unitLength="1"/>
  <microsatSeq>
    10,?,15,?
  </microsatSeq>
</microsatellitePattern>

<microsatellitePattern id="locus2">
  <taxa idref="Locus2.taxa"/>
  <microsatellite idref="test.microsat"/>
  <microsatSeq>
    12,10,?,8
  </microsatSeq>
</microsatellitePattern>

Note the presence of missing values. We allow missing values in this case 
because every tip is associated to at least one non-missing value of a locus. 
So we won't have empty tips.

Would this modification be possible?

Kind regards,

Jessie

Original comment by dong.w.xie@gmail.com on 21 Feb 2013 at 10:58

GoogleCodeExporter commented 9 years ago
implement mask functions in Pattern class to help BEAUti to generate XML with 
or without unknown character "?",  which can use ((PartitionPattern) 
abstractPartitionData).getPatterns().hasMask() to find if it is using mask or 
not for generating XML idref of taxa, starting tree ...  

Original comment by dong.w.xie@gmail.com on 21 Feb 2013 at 11:05

GoogleCodeExporter commented 9 years ago
XML not passed:

Dear Jessie,

You did not change parser's rule for <externalValues> in 
<microsatelliteSamplerTreeModel>, and it still requires one < 
microsatellitePattern > only. This is not working for linking tree case. Should 
I change this rule to allow multi-patterns?  

<microsatelliteSamplerTreeModel id="Locus1.treeModel.microsatellite">
<tree>
<treeModel idref="Locus1.treeModel"/>
</tree>
<internalValues>
<parameter id="Locus1.treeModel.microsatellite.internalNodesParameter" 
dimension="19"/>
</internalValues>
<externalValues>
<microsatellitePattern idref="Locus1"/>
<microsatellitePattern idref="Locus3"/>
</externalValues>
</microsatelliteSamplerTreeModel>

Cheers,

Walter

Original comment by dong.w.xie@gmail.com on 22 Feb 2013 at 3:30

GoogleCodeExporter commented 9 years ago
Hi Walter,

Sorry I should have made it clearer, The number of 
microsatelliteSamplerTreeModel block should *not* change at all. You will still 
have separate microsatelliteSamplerTreeModel for separate microsatellitePattern 
and so it's still *one-to-one*. 

The *only* difference when the loci are linked is that all the 
microsatelliteSamplerTreeModel objects share the *same* treeModel object. So, 
if Locus 1 and 3 are linked then, there microsatelliteSamplerTreeModel blocks 
will look like

    <microsatelliteSamplerTreeModel id="Locus1.treeModel.microsatellite">
        <tree>
            <treeModel idref="Locus1_3.treeModel"/>
        </tree>
        <internalValues>
            <parameter id="Locus1.treeModel.microsatellite.internalNodesParameter" dimension="19"/>
        </internalValues>
        <externalValues>
            <microsatellitePattern idref="Locus1"/>
        </externalValues>
    </microsatelliteSamplerTreeModel>

    <microsatelliteSamplerTreeModel id="Locus3.treeModel.microsatellite">
        <tree>
            <treeModel idref="Locus1_3.treeModel"/>
        </tree>
        <internalValues>
            <parameter id="Locus3.treeModel.microsatellite.internalNodesParameter" dimension="19"/>
        </internalValues>
        <externalValues>
            <microsatellitePattern idref="Locus3"/>
        </externalValues>
    </microsatelliteSamplerTreeModel>

Please note that both blocks have <treeModel idref="Locus1_3.treeModel"/>. Tree 
likelihoods etc do not change.

Hope that makes sense.

Thanks,

Jessie

Original comment by dong.w.xie@gmail.com on 22 Feb 2013 at 3:30

GoogleCodeExporter commented 9 years ago
another bug:

Creating the tree model, 'Locus1.treeModel'
  initial tree topology = (((((((T14,T15),T2),((T16,T17),T13)),(T20,T6)),(T12,T18)),((((T11,T7),(T4,T9)),(T19,T3)),T8)),((T1,T5),T10))
  tree height = 2710.78321092947
Error running file: microsatellite_data.xml
Fatal exception: Incorrect dimension of bounds, expected 19 but received 2
java.lang.IllegalArgumentException: Incorrect dimension of bounds, expected 19 
but received 2
    at dr.inference.model.IntersectionBounds.addBounds(IntersectionBounds.java:39)
    at dr.inference.model.Parameter$Default.addBounds(Parameter.java:514)
    at dr.evomodel.tree.MicrosatelliteSamplerTreeModel.initialiseInternalStates(MicrosatelliteSamplerTreeModel.java:210)
    at dr.evomodel.tree.MicrosatelliteSamplerTreeModel.<init>(MicrosatelliteSamplerTreeModel.java:68)
    at dr.evomodelxml.tree.MicrosatelliteSamplerTreeModelParser.parseXMLObject(MicrosatelliteSamplerTreeModelParser.java:43)
    at dr.xml.AbstractXMLObjectParser.parseXMLObject(AbstractXMLObjectParser.java:119)
    at dr.xml.XMLParser.convert(XMLParser.java:317)
    at dr.xml.XMLParser.convert(XMLParser.java:288)
    at dr.xml.XMLParser.parse(XMLParser.java:167)
    at dr.app.beast.BeastMain.<init>(BeastMain.java:145)
    at dr.app.beast.BeastMain.main(BeastMain.java:592)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Creating the tree model, 'Locus5.treeModel'
  initial tree topology = ((T19,T20),T18)
  tree height = 385.1948205903774

Walter 

Original comment by dong.w.xie@gmail.com on 22 Feb 2013 at 3:31

GoogleCodeExporter commented 9 years ago
The error was due an error in the specification of the dimension of the 
Locus5.microsatelliteSamplerTreeModel.internalNodesParameter object. The 
corrected xml is attached.

Jessie

Original comment by akaru...@gmail.com on 22 Feb 2013 at 4:58

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by dong.w.xie@gmail.com on 24 Feb 2013 at 10:31