Update vectorized reinforcement learning

This pull request updates vectorized reinforcement functionalities of EvoTorch, so that they are compatible with the gymnasium 1.0.x API (while preserving compatibility with gymnasium 0.29.x).

In more details, this pull request introduces an EvoTorch-specific SyncVectorEnv implementation (as an alternative to gymnasium's SyncVectorEnv class). This custom SyncVectorEnv preserves the classical auto-reset behavior on which EvoTorch relies, allowing us to transition to gymnasium 1.0.x.

Having our custom SyncVectorEnv allows us to introduce these performance-related improvements as well:

observations, rewards, etc. reported by SyncVectorEnv are now moved into the device of where the policies are executed;
sub-environments of SyncVectorEnv that have reached the maximum number of episodes are not executed further.

Brax-related notebook examples are also refactored. Instead of including the entire brax example in a single notebook, there are now two notebooks, one focusing on the training and the other focusing on the visualization. The visualization example is updated so that it works correctly with the latest version of brax.

nnaisense / evotorch

Update vectorized reinforcement learning #104