Issue with step and reset in TempVMPackingEnv

Hi team,

Awesome work, really like the paper.

I've found a couple of issues with the TempVMPackingEnv environment (VMPacking-v1).

The main one is that the custom logic of the step method was definied as step() instead of _STEP() and so steps in the environment were actually using _STEP() for the parent class. There's is also a small indexing error when demand is being removed from a PM, and I think it would be helpful to also round down any remaining use below tolerance to zero.

There was also an issue in the _RESET() method, where state was being created as an array istead of a dict, and so updates to the state, based on dict keys, did not work.

Proposed fixes below, with a comment followed by my initials (JMF) and an explanation for each changed/added line. Happy to raise a pull request if helpful.

   def _STEP(self, action): # JMF: _STEP instead of step, otherwise _STEP from parent used
        done = False
        pm_state = self.state["state"][:-1]
        demand = self.state["state"][-1, 1:]

        if action < 0 or action >= self.n_pms:
            raise ValueError("Invalid action: {}".format(action))

        elif any(pm_state[action, 1:] + demand > 1 + self.tol):
            # Demand doesn't fit into PM
            reward = -1000
            done = True
        else:
            if pm_state[action, 0] == 0:
                # Open PM if closed
                pm_state[action, 0] = 1
            pm_state[action, self.load_idx] += demand
            reward = np.sum(pm_state[:, 0] * (pm_state[:,1:].sum(axis=1) - 2))
            self.assignment[self.current_step] = action

        # Remove processes
        if self.current_step in self.durations.values():
            for process in self.durations.keys():
                # Remove process from PM
                if self.durations[process] == self.current_step:
                    pm = self.assignment[process] # Find PM where process was assigned
                    pm_state[pm, self.load_idx] -= self.demand[process][1:] # JMF: Index to exclude first element of demand array
                    pm_state[pm, self.load_idx] = np.where(pm_state[pm, self.load_idx]<self.tol, 0., pm_state[pm, self.load_idx]) # JMF: Deal with rounding
                    # Shut down PM's if state is 0
                    if pm_state[pm, self.load_idx].sum() == 0:
                        pm_state[pm, 0] = 0

        self.current_step += 1
        if self.current_step >= self.step_limit:
            done = True
        self.update_state(pm_state)
        return self.state, reward, done, {}

    def _RESET(self):
        self.current_step = 0
        self.assignment = {}
        self.demand = self.generate_demand()
        self.durations = generate_durations(self.demand)
        self.state = {
            "action_mask": np.ones(self.n_pms, dtype=np.uint8),
            "avail_actions": np.ones(self.n_pms, dtype=np.uint8),
            "state": np.vstack([
                np.zeros((self.n_pms, 3)),
                self.demand[self.current_step]],
                dtype=np.float32)
        } # JMF: Dict instead on NumPy array, otherwise update_state doesn't work
        return self.state

hubbs5 / or-gym

Issue with step and reset in TempVMPackingEnv #36