Closed schrum2 closed 1 year ago
Seems to be working properly. I had to reduce the number of threads to three because TNT causes a lot of strain on the server.
Let's let it run to completion before closing this
With the threads down to 3 it has slowed down the test. Both are only a fourth of the way done. They should be done by Tuesday when we get back.
This is for the target to the side.
It's good that it's not crashing though. The next time we test, we may try ramping up to 5 threads or more.
One thing I'm concerned with here is how consistent scores are. I suspect they are not.
Can you take a look at some high scoring result from this and evaluate it multiple times in a row, reporting the scores in this issue thread?
I also think one thing to note for these cannons in general is that I think it is only possible because of how we are spawning them in. We are using loops to spawn them in, so every shape is spawned at a different time. This allows for the variation in activation of the TNT. If every block was somehow spawned at the exact same time they would probably all explode but not be able to push any of the other TNT.
I don't think this is accurate. Although a look adds each block to a block list, the blocks themselves are spawned via a single API call to the spawnBlocks command in the MinecraftClient, which corresponds to a similar command on the Python side. So, all blocks should be spawned at the same time. However, the timing of the snapshots is definitely nondeterministic, and even though Minecraft itself is supposed to be deterministic, the fact that you could spawn the same shape in different places or at different times could easily be enough on its own to affect the random number generator, and thus lead to the exact same shape exhibiting different behavior across multiple evaluations.
Regardless, observation is needed to track the variation in scores.
The two tests I am running are using the new bin labels so they can't use postSpawnMinecraftEvaluateBlocksMissile.bat. Should I run an old test so we can examine it tomorrow?
With shapesIsWorthSaving implemented it is clear to see that the scores aren't consistent. It is also clearly still getting better and evolving.
Make a binning scheme/behavior characterization for canons that is num Obsidian vs num TNT and then make a batch file for it and test it out.