palaeoware / trevosim

TREvoSim - The [Tr]ee [Evo]lutionary [Sim]ulator program
GNU General Public License v3.0
4 stars 3 forks source link

Species naming in NEXUS output #36

Closed ms609 closed 1 month ago

ms609 commented 2 months ago

Species are named differently within Matrix (Species_N) and MrBayes_Tree (S_0N).

#NEXUS
[||Settings||]
BEGIN DATA;
Dimensions ntax=||Taxon_Number|| nchar=||Character_Number||;
Format datatype=standard missing=? gap=-;
MATRIX
||Matrix||
;
END;

BEGIN TREES;
  tree simulated_tree = ||MrBayes_Tree||;
END;

gives:

#NEXUS
[variables : genomeSize 16 speciesSelectSize 16 fitnessSize 16 runForTaxa 4 runForIterations 1000 playingfieldSize 20 speciesDifference 2 environmentMutationRate 1 organismMutationRate 1 unresolvableCutoff 4 environmentNumber 1 maskNumber 5 runMode 1 stripUninformative 0 writeTree 1 writeRunningLog 0 writeFileOne 1 writeFileTwo 0 writeEE 0 noSelection 0 sansomianSpeciation 1 discardDeleterious 0 fitnessTarget 50 playingfieldNumber 1 mixing 0 mixingProbabilityZeroToOne 0 mixingProbabilityOneToZero 0 playingfieldMasksMode 0 selection 10 randomOverwrite 0 ecosystemEngineers 0 ecosystemEngineersArePersistent 0 ecosystemEngineeringFrequency 10 ecosystemEngineersAddMask 0 runningLogFrequency 50 expandingPlayingfield0 stochasticLayer 0 stochasticDepth 1 matchFitnessPeaks 0 stochasticMap 0000000000000000]
BEGIN DATA;
Dimensions ntax=4 nchar=16;
Format datatype=standard missing=? gap=-;
MATRIX
Species_0   1111110011101110
Species_1   1111110001100110
Species_2   1111010011101110
Species_3   1111110011101111

;
END;

BEGIN TREES;
  tree simulated_tree = (S_01:2,(S_02:10,(S_03:1,S_00:1):9):1):25;
END;
RussellGarwood commented 1 month ago

Thanks for picking this up - because of the formatting of the .nex output to make it work in phangorn, I had not spotted this in the MrBayes output string. I have standardised this with the above commit - I note that I also changed the matrix labels to have zero padding, as I think this is a sensible convention. Output for above code is now:

#NEXUS
[variables : genomeSize 128 speciesSelectSize 128 fitnessSize 128 runForTaxa 32 runForIterations 1000 playingfieldSize 20 speciesDifference 4 environmentMutationRate 1 organismMutationRate 2 unresolvableCutoff 5 environmentNumber 1 maskNumber 5 runMode 1 stripUninformative 0 writeTree 1 writeRunningLog 0 writeFileOne 1 writeFileTwo 1 writeEE 0 noSelection 0 sansomianSpeciation 1 discardDeleterious 0 fitnessTarget 50 playingfieldNumber 1 mixing 0 mixingProbabilityZeroToOne 0 mixingProbabilityOneToZero 0 playingfieldMasksMode 0 selection 10 randomOverwrite 0 ecosystemEngineers 0 ecosystemEngineersArePersistent 0 ecosystemEngineeringFrequency 10 ecosystemEngineersAddMask 0 runningLogFrequency 50 expandingPlayingfield0 stochasticLayer 0 stochasticDepth 1 matchFitnessPeaks 0 stochasticMap 0000000000000000]
BEGIN DATA;
Dimensions ntax=32 nchar=128;
Format datatype=standard missing=? gap=-;
MATRIX
Species_00  01110010100000001000001011110010100011000010101111111011110111100111001011010000111001001110000010011000110010010111000100100110
Species_01  01110010100000001000001011110010100011000010101111111011110111100111001011010000111001001110000010011010110010010111000100101110
Species_02  01110010110000001000001011110010000011000010101111111011110111100111001011010000110001001011000010011000110010010111000000100110
Species_03  01110010110000001000001011100010100011000010101111111011110111100101001011010000111001001011000010011000110010010111000100100110
Species_04  01110010100000000000001011110010101011000010101111111011110111100111001011010001111001001110000010011000110010110111000100100100
Species_05  01110010110000001000001011110010000011000010101111111011110111110111001011010000111001011011000010011000110010010111000100100110
Species_06  01110010110000001000001011110010100011000010101111111011110111100111011011010000011001001011000010111000110010010111000100100110
Species_07  01110010100000101000001011110010100011010010101111111011110111100111001011010001111001001010001010011000110010010111000100100110
Species_08  01110010100000100000001011110010100011010010101111111010110111100111001011010001111001001010001011011000110010010111000000100100
Species_09  01110010110000001000001011100010100010000000101111111011110111100001001011011010111001001011000010011000110010010111000100100110
Species_10  01110010100000000000001011110010101011000010101111111011110111100111001110010001111001001110000010011000110010110111000100100100
Species_11  01110010110000001000001011110010100011000010101111111011110110100111001011010000111001001011000010011000110010010101000101100110
Species_12  01110000110000001000001011110010100011000010101111111011110111100111001011010000111001001011000010011000100010010111000100100111
Species_13  01110010110000101000001011110010100011000010101111111011110111100111001011010000111001001001000010011000111010010111000100100110
Species_14  01110010110000000000001011110010111011000010101111111011110111100111001011010000111011001110000010011000110010010111000100100100
Species_15  01110010100000000000011011110010101011000010101111111011110111100111001011110001111001001110000010010000110010110111000100100100
Species_16  01111010000000000000001011110010111011000010101111111011110111100111001011010000111001001110000010011000110010010111000100100100
Species_17  01110010100000001000001011110010100011010010101111111011110111100111101011010001111001001010000010011000110010010111000110100010
Species_18  11110010100000000000001011110010101011000010101111111011110111100111001011010001111101001110000110011000110010110111000100100100
Species_19  01110010100000000000001011110010101011000010101111111011110111000111001011000001111011001110000010011000110010110111000100100100
Species_20  11110010100000000000010011110010100001000010101111111011110111100111001011110001111001001110000010010000110010110111000100101100
Species_21  10110010100100000000001011110010101011000010101101111011110111100011001111010001111101001110000110011000110010110111000100100100
Species_22  01110010100000000000011011110010101011000010101111111011110011100110001011110001111001001010000010010001110010110111000100000100
Species_23  01110010100000000000001011110010101011000010101111111011110111000111001011001001111011101110000010011000110010010111100100110100
Species_24  01110010100000000000001011110010101011000010101111111011110101000111001011010000111001001110000011011000111010010111010100100100
Species_25  01110010100000000000001011110010101001000010101111111011110111000111001011000001111011101110000010011000110010011111000100110100
Species_26  01110010100000000000011111110010100001001010101111111011010111100111011011110001111001001110000010010000110010110110000100100100
Species_27  01110010000000000000001011110010101011000010101111100011110111000111001011000001111011001110000010011000110010110111000100100100
Species_28  01110100100000000000001011110010101011000010101110111011110111100111001110010001111001101110000010011000110010110011000100100100
Species_29  01110010100000000000001011010010101011000010101111110011110111100111001011010001111001001110000010011000110010100111000100100100
Species_30  11110010100000001000001011110010101011000010101111111011110111100111001011010001111001001111000010011000110010110111000100100100
Species_31  01110010100000000000010011110010100001000010101111111011110111100111001011110001111001001010000010010000110010110011000100100100

;
END;

BEGIN TREES;
  tree simulated_tree = (Species_01:17,(Species_02:6,((Species_09:17,Species_03:15):13,(((Species_28:4,Species_10:4):29,(Species_14:15,((Species_20:3,(Species_22:2,(Species_26:8,(Species_31:1,Species_15:1):7):5):2):12,(Species_16:11,((Species_21:5,Species_18:14):9,((Species_23:11,(Species_25:8,(Species_27:2,Species_19:7):2):2):11,(Species_24:10,(Species_29:3,(Species_30:2,Species_04:2):1):7):12):1):2):2):1):5):13,(Species_05:2,(Species_06:2,((Species_08:36,(Species_17:8,Species_07:24):12):4,(Species_11:11,(Species_12:8,(Species_13:12,Species_00:30):1):1):8):3):2):1):1):1):2):5;
END;

I think this is now as expected, but do shout if I missed anything

ms609 commented 1 month ago

Looks good – I would suggest also updating the Tree File from the End Run Log to use leading zeros, as this currently uses labels in the Species_1 (not Species_01) format.

RussellGarwood commented 1 month ago

Thanks for catching this. It caused an issue with the defaults script, so I already fixed it (though I wish I had done so immediately as it took me a good 45 minutes to figure out it was the names on the tree file that were doing it!)