Unable to replicate results

I'm trying to get the old ISI aligner to work as it still represents near SoTA performance and I have been using your code here, but I'm not able to replicate the results and I'm wondering if you have any idea why that might be.

When I run the scripts against amr-release-1.0-dev-consensus.txt I get, Precision: 78.76 Recall: 74.83 F1: 76.75. From the original paper, and your thesis, I was expecting an F1 of 86.5.

I've split the amr1 data using your scripts/mt_scripts/split_en_amr.py and I'm running against the ISI gold dev alignments. I'm using mgiza from https://github.com/moses-smt/mgiza/tree/master/mgizapp as the link in your README is not working. Given that the scores are reasonable, I think I must be doing something subtly wrong.

Questions:

Should running the current code in this project yield a 86.5 smatch as setup or do I need to change something to get this?
Do you have a special version of mgiza that I need to download (please send me a link) or should the current moses version work (I don't see any obvious failures in the output)?
Any other ideas what might be going wrong?

melanietosik / string-to-amr-alignment

Unable to replicate results #1