powertac / powertac-server

Power TAC simulation server
www.powertac.org
Apache License 2.0
44 stars 35 forks source link

RandomSeeds are not getting logged in latest release #1072

Open jecollins opened 3 years ago

jecollins commented 3 years ago

The state logs for the 2020 tournament do not contain the random seeds generated during initialization. This prevents full reproduction of game conditions. The seeds were logged in the 2017, 2018, and 2019 tournaments.

jecollins commented 3 years ago

The RandomSeed class is getting logged in the current 1.8.0-SNAPSHOT version.

jecollins commented 3 years ago

TariffSubscription instances are also not logged in the 2020 finals games.

jecollins commented 3 years ago

It appears that none of the classes in server-interface were logged in the 2020. Perhaps there was an error in the build?

erikkemperman commented 3 years ago

First, I did a clean mvn install of powertac-core and powertac-server, each on the 1.7.0 release tag from github. Running a game from that, my statelog files correctly contain all classes from server-interface.

Then I cleaned out my ~/.m2/repository/org/powertac, and proceeded to install from server-distribution, which just fetches jar files from maven central. Running a game from that, my statelog files did not have classes from server-interface (at least not the ones I checked).

I noticed that the jar files for the locally built server-interface was larger than the one downloaded from maven central. Listing the contents showed exactly the same classes in each jars, but some locally built class files larger than the corresponding entry in the jar from maven central.

Decompiling the RandomSeed class from each jar side by side revealed that, indeed, the aspectj stuff is missing from the jar in maven central. See screenshot.

Screen Shot 2021-01-13 at 5 17 46 PM

So, something has gone wrong in building the release -- but I have no idea what that might be. Simply doing mvn clean install on the release tag produces the correct class file with state logging included. And that should be what the maven-release plugin pushes to maven central, as far as I know.

Since we do have state entries for other classes, my only idea as to what is special about server-interface relative to other modules is that it re-uses the org.powertac.common packagename (also defined in powertac-core/common).

Other than confirming your suspicion, John, I am not sure how to proceed.

jecollins commented 3 years ago

Thanks. This is very helpful.

For how we should proceed --

  1. There is no way to recover the missing data from the 2020 tournaments. Participants need to know this before they waste significant amounts of time in confusion as I did.
  2. We need to add a check for this issue to the release process.
  3. We could do a re-build and release a 1.7.1 version that would be identical to 1.7.0 other than correct logging.
  4. We could just suggest that folks interested in running experiments use 1.8.0-SNAPSHOT.

It's possible that re-using the org.powertac.common package name was a mistake, but I don't believe it's considered incorrect, and we've never seen this problem in the past. Of course, I don't have the typescript from the release build, so there's now no way I can see to get a handle on how this happened.

erikkemperman commented 3 years ago

I'm trying to figure out what happened. Observations thus far:

erikkemperman commented 3 years ago

Finally managed to resolve things, I believe.

I've released 1.7.1 versions of powertac-core, powertac-server, sample-broker and server-distribution -- although it seems the latter is not configured to be pushed to sonatype / mvn central?

It appears that we ran afoul of aspectj maven plugin incorrectly detecting no changes and thus not operating, even though the classes had just (for the second time during release:perform) been recompiled and therefore at that point do not have aspects added. It looks like this only went wrong for server-interface, not for the other modules in powertac-server which rely on aspect weaving.

Compounded by the fact that the relevant unit-test had been disabled for reasons I don't quite understand, this failure went unnoticed and did not abort the final stages of release as it should have.

I finally found the "solution" (workaround) here: https://github.com/mojohaus/aspectj-maven-plugin/issues/15 -- basically we now force aspectj to always work for server-interface instead of trying to be clever about it and getting it wrong.

With Govert I briefly discussed the possibility of regenerating the 2020 finals games. As it seems that there would be ways of deterministically regenerating the seed values for a given game, I think it might theoretically be possible.

However it would take an enormous amount of work (mostly writing broker-proxies that replay the exact same transactions as in the original games based on the logfiles, and then re-running all of the games). Certainly it'd be much more work than we currently can fit in our schedules...

jecollins commented 3 years ago

I disabled that logging test because it wasn't working and was preventing the release -- see Issue #1033. I've gone back and looked at that a couple times and have not found a solution.

Also, server-distribution does not get posted on Maven Central because it's intended to be downloaded as source -- it's just a README, a pom.xml, and some config files. I think that's clear in the "Getting Started" pages on both Wikis.

At this point, I am aware of nobody who is doing experiments that would need the original seeds. I ran into it because I was trying to write a log analyzer that looked at TariffSubscription changes, and found they were not recorded in those logs. The withdraw and signup transactions are in the logs, so the information is still there, just harder to extract.

Thanks for digging into this. I think we can close this, but it raises the priority of #1033.

erikkemperman commented 3 years ago

I disabled that logging test because it wasn't working and was preventing the release

Well, yes. That's what it was supposed to do :-) Disabling this test, rather than getting to the root issue (the aspectj plugin somehow behaving differently for relase:perform from what it does in other scenarios) is what allowed broken class files to be published.

erikkemperman commented 3 years ago

John, this was the first time I performed a release... I successfully did mvn release:prepare and mvn release:perform but I do not see the 1.7.1 versions on search.maven.org — is there an additional step to be done after pushing to nexus?

erikkemperman commented 3 years ago

Apologies, I think I figured it out. Our nexus configuration does not automatically close and release the staging repository, so that has to be done manually on oss.sonatype.org. I just did that now, and I would now expect the 1.7.1 artifacts to show up in maven search after things are synced and indexed.

So if that pans out (let's check in a couple of hours) we will have a replacement for 1.7.0 which is essentially the same but with the state log issue of server-interface components resolved. I am not sure if this needs to be communicated in some way, but if so I think perhaps it would be more likely to be noticed if it came from you.

jecollins commented 2 years ago

We still need to re-enable log testing in several places.