jjmccollum / teiphy

A Python package for converting TEI XML collations to NEXUS, BEAST 2.7 XML, and other formats
MIT License
11 stars 3 forks source link

system tests in CI #38

Closed rbturnbull closed 2 years ago

rbturnbull commented 2 years ago

I'll set up a github action workflow to run the output of teiphy with iqtree. @jjmccollum - you can use that as an example for other programs like stemma and MrBayes.

jjmccollum commented 2 years ago

@rbturnbull I took a look at the error message for the iqtree test on cicd. Part of the problem is that the command

poetry run teiphy example/ubs_ephesians.xml example.nexus --states-present 

doesn't collapse any trivial readings, so we end up hitting the maximum of 64 symbols pretty quickly. To prevent this, the command should probably look more like

poetry run teiphy -t"reconstructed" -m"overlap" -m"lac" --states-present example/ubs_ephesians.xml example.nexus

or

poetry run teiphy -t"reconstructed" -t"defective" -t"orthographic" -m"overlap" -m"lac" --states-present example/ubs_ephesians.xml example.nexus

But the real problem that is causing IQTREE to complain is that some of the symbols are capital versions of earlier symbols, and IQTREE treats these as case-insensitive. This is partly in compliance with the guidelines in Maddison, Swofford, and Maddison's paper on NEXUS, which notes that the base symbols in the SYMBOLS block are case insensitive. But the paper also explicitly states that symbols in the EQUATE block are case-sensitive, so an EQUATE symbol of 'A' should be valid even if 'a' is a base symbol. IQTREE's parser does not seem to make this distinction. I'm not sure if it's worth writing up an issue in their repo about this. In any event, this does mean that I should fix the code to use only 0-9 and a-z for base symbols, since 'a' and 'A' should not both be base symbols. If I set aside A-Z and a few extra ASCII symbols, then we should have a clean set of 32 base symbols and 32 EQUATE symbols.

Unfortunately, IQTREE seems to have a problem with any characters outside of 0-9 and a-z (up to capitalization of the latter set), including characters in the standard ASCII set, so even this will not work if we have too many EQUATE symbols. But since IQTREE just renders all ambiguities as missing characters anyway (see my recent comments on #4), it would probably be fine if I use other ASCII characters to pad out the EQUATE symbols set, add an option for the app to treat all ambiguous states as missing, and note in the documentation that outputs intended for IQTREE must be generated using this option.

rbturnbull commented 2 years ago

i think the NEXUS standard is usually only loosely followed by the different phylogenetic software packages. The more flexible teiphy can be the better. I think an option to treat ambiguous states as missing is a good idea. Some packages might also will expect them with curly brackets in the data matrix as well instead of with an EQUATE symbol

jjmccollum commented 2 years ago

All right, in my latest push on cicd, I've commented out the code that generates the EQUATE symbols and writes the EQUATE block to NEXUS. Instead, I just write the ambiguous states with curly braces like you mentioned, as this requires fewer total symbols. The system test set up in iqtree.yaml now works on that branch, but I'll wait to merge that branch into main until we've implemented system tests for stemma and mrbayes.

jjmccollum commented 2 years ago

@rbturnbull Apologies for the long line of identical commits, but we now have a passing system test for mrbayes in place! That just leaves stemma, whose system test is currently failing at the install step. Here's the script I have at the moment:

    - run: poetry install
    - name: Testing
      run: |
        poetry run teiphy -t reconstructed -t defective -t orthographic -m overlap -m lac -s"*" -s T --fill-correctors --format stemma example/ubs_ephesians.xml stemma_example
    - name: Install Phylogenetics Package
      run: |
        git clone https://github.com/stemmatic/stemma.git
        cd stemma
        make
    - name: Phylogenetics Run
      run: |
        stemma stemma_example

And here's the error I'm getting during installation:

cc -g -Wall  -O3          -c -o stemma.o stemma.c
cc -g -Wall  -O3          -c -o taxon.o taxon.c
cc -g -Wall  -O3          -c -o net.o net.c
cc -g -Wall  -O3          -c -o netmsg.o netmsg.c
cc -g -Wall  -O3          -c -o hyp.o hyp.c
cc -g -Wall  -O3          -c -o heur.o heur.c
cc -g -Wall  -O3          -c -o cache.o cache.c
cc -g -Wall  -O3          -c -o log.o log.c
cc -g -Wall  -O3          -c -o boot.o boot.c
cc -g -Wall  -O3          -c -o state.o state.c
cc   stemma.o taxon.o net.o netmsg.o hyp.o heur.o cache.o log.o boot.o state.o  -lm -o stemma
ln -f stemma /home/runner/bin/
ln: target '/home/runner/bin/' is not a directory: No such file or directory
make: *** [Makefile:25: /home/runner/bin/stemma] Error 1
Error: Process completed with exit code 2.

There seems to be an issue with the location of the bin directory in the Docker container. Any idea what could be causing this? And @stemmatic, does the script above look like it's doing the right thing, or are there additional steps I need to include?

jjmccollum commented 2 years ago

@rbturnbull I'm afraid I'm going to need your help debugging the stemma.yml workflow. Both of the Makefiles for prep and stemma (the two tools we need to run stemma on the generated file) set some environment variables with the line

BIN=$(HOME)/bin

So it's possible that this is related to the target '/home/runner/bin/' is not a directory: No such file or directory error message that the runner throws when I try to make each project, but I haven't been able to fix things by exporting different paths for HOME, so I'm not sure what else to do.

Once we have this system working, I think we should be all set with the code. Once we have the formatted stemma images added to the paper, we should be ready to submit it to JOSS.

stemmatic commented 2 years ago

So it's possible that this is related to the target '/home/runner/bin/' is not a directory: No such file or directory error message that the runner throws when I try to make each project, but I haven't been able to fix things by exporting different paths for HOME, so I'm not sure what else to do.

Yeah, the Makefile is trying to hard link the executable file into one’s personal bin directory. Either create the director or comment out that line in the Makefile are two easy work-arounds.

jjmccollum commented 2 years ago

@stemmatic Thanks! It looks like creating the /home/bin directory completes the installation:

sudo mkdir /root/bin
git clone https://github.com/stemmatic/prep.git
cd prep
sudo make
cd ..
git clone https://github.com/stemmatic/stemma.git
cd stemma
sudo make

It looks like prep processed the file correctly, but stemma threw the following error:

7: < (   7) Variant mismatch: 0 (1) should have exactly 0
/root/bin/prep stemma_example
 @ B10K1V1U24-26 [ εν.εφεσω – ]
Fatal error, terminating ...
Parallels=1; MSS=38; VarUnits=0; Pieces=40; Sets=105
jjmccollum commented 2 years ago

@rbturnbull All right, with @stemmatic's help, we now have all system tests running successfully! Here's the output of running STEMMA with simulated annealing for 100 iterations:

image

And with that, I'll close this issue and merge cicd back into main.