prosysscience / JSSEnv

An OpenAi Gym environment for the Job Shop Scheduling problem.
MIT License
190 stars 55 forks source link

Regarding implementation of _prioritization_non_final function #6

Open smita-09 opened 2 years ago

smita-09 commented 2 years ago

Hello there, Thank you for the environment implementation, it has been a great help for me. While going through your paper, it was written that you want to somehow prioritize the non-final operations but in your implementation I did not understand how you achieved that,

if len(non_final_job) > 0:
        for job in final_job:
              current_time_step_final = self.todo_time_step_job[job]
              time_needed_legal = self.instance_matrix[job][current_time_step_final][1]
              if time_needed_legal > min_non_final:
                  self.legal_actions[job] = False
                  self.nb_legal_actions -= 1

In the implementation of your function dedicated to achieving the non-final-operation-prioritization, there are a few things that I found quite confusing,

Any explanation will be highly appreciated. Thanks in advance!

ingambe commented 2 years ago

Hello,

Thank you, it's nice to hear it is useful :) And thank you for reviewing the code, that's important.

You are first making sure that the length of non-final jobs is non-zero and then entering in a loop that has nothing to do with it.

if len(non_final_job) > 0:

Checks if there are non-final jobs to schedule, if there are, we need to make some final jobs illegal. Otherwise, we don't as there are no jobs that are non-final and prioritization of non-final jobs makes no sense.

Secondly, inside this loop at line 6, you are actually making the job with its final operation illegal and decreasing the legal actions(in other words you are prioritizing it), I tried to evaluate that sentence with an example as well but it did not work, for eg: lets's say we have two jobs J1 and J2 at final job operation and non-final job operation respectively with the time taken by J1 is 5 and time taken by J2 is 3(keeping that in mind that they both need the same machine), in this implementation, it actually prioritizes the final operation and that should not be the case, I think according to paper.

Line 2, we loop through the final jobs, we check how long it takes to perform the current operation of the final job. We need to check that the length of the final operation of this final job is longer than the minimum of the non-final one, otherwise, there might be a case where it would be better to schedule the shorter final operation as it would potentially allow another job to finish their previous operation and have a free machine. Then, the final job is made illegal in line 6

self.legal_actions[job] = False

job is defined in line 2 as part of the final jobs. Thus, the final jobs are made illegal, making the non-final jobs prioritized.

smita-09 commented 2 years ago

Thank you very much for your time to answer. While running the code for a different instance, error, index out of bound occurred. I think I know why this is happening, when the agent is not trained and when it is selecting random actions initially, and it selects the same action over and over again, the case where the current_time_job is reached to total number of machines is not seem to be there and that seems to be the reason why it is throwing an error. Inside the step function:

            current_time_step_job = self.todo_time_step_job[action]
            machine_needed = self.needed_machine_jobs[action]
            time_needed = self.instance_matrix[action][current_time_step_job][1]
            reward += time_needed

Is it a bug or there is something missing which I did not pay attention to?

ingambe commented 2 years ago

The observation provided by the environment contains both a boolean array indicating if the action is legal or not and the "real" observation

self.observation_space = gym.spaces.Dict({
            "action_mask": gym.spaces.Box(0, 1, shape=(self.jobs + 1,)),
            "real_obs": gym.spaces.Box(low=0.0, high=1.0, shape=(self.jobs, 7), dtype=np.float),
        })

A random agent would have to sample legal action from this action_mask array, otherwise, you might take illegal actions. In theory, it is not possible to take the same action over and over again as the job will have one of his operations currently on a machine and might not be free for schedule.

For research purposes, I've made a random loop using RLLib: https://github.com/prosysscience/RL-Job-Shop-Scheduling/blob/0bbe0c0f2b8a742b75cbe67c5f6a825b8cfdf5eb/JSS/randomLoop/random_loop.py

If you don't want to use RLLib, you can write a simple random loop using numpy.random.choice function:

np.random.choice(len(legal_action), 1, p=(legal_action / legal_action.sum()))[0]

Where legal_action is the array of legal action (i.e., action_mask). This line of code will randomly sample one legal action from the action_mask.

smita-09 commented 2 years ago

I see what you are trying to say there. I did not pay attention to that in the beginning and implemented my own sort of untrained agent, that can pick any random actions without taking into account the legal actions so it started giving errors but now I get it. Thanks for your time and effort to reply.

How would you incorporate the due date factor of JSS problem in the solution because I am sure this code would not work for JSS with due dates so do you have any idea what could be done to achieve the solution to that problem? Like if there is some sort of priorities assigned to the jobs, how can that be handled using this reinforcement learning agent? Would like to hear any suggestions.