Resolve deadlock during synchronization

In the previous code, the code got stuck randomly at torch.cuda.synchronize() in state.py. You can reproduce this random deadlock by running

python examples/mnist_influnece/compute_influences_distributed.py

Upon debugging, this is due to the use of set() for saving various states in state.py. As Python set lacks ordering among its elements, different processes tried to synchronize different states (e.g. rank 0: covariance_state, rank 1: covariance_counter), and this silently caused deadlock. I was able to find this bug with TORCH_DISTRIBUTED_DEBUG=DETAIL.

In addition, I improved __init__ and confirmed that users can now do:

import analog

analog.init(project="test")

analog.watch(model)
analog.setup({"log": "grad", "statistic": "kfac"})

logix-project / logix

Resolve deadlock during synchronization #68