Sometimes when creating large (200 node) clusters, while waiting for nodes to be in a running state, AWS reports an instance id is not found, probably because it hasn't propogated through their system. If I check the AWS Web console, the instance is there. However, StarCluster fails. StarCluster should wait for 5 seconds or so and re-poll for the instance. Perhaps doing this a few times before hard-failing would be sufficient.
Sometimes when creating large (200 node) clusters, while waiting for nodes to be in a running state, AWS reports an instance id is not found, probably because it hasn't propogated through their system. If I check the AWS Web console, the instance is there. However, StarCluster fails. StarCluster should wait for 5 seconds or so and re-poll for the instance. Perhaps doing this a few times before hard-failing would be sufficient.