tsinghua-fib-lab / DRL-urban-planning

A deep reinforcement learning (DRL) based approach for spatial layout of land use and roads in urban communities. (Nature Computational Science)
https://rdcu.be/dlRPZ
MIT License
159 stars 33 forks source link

policy network outputs a action whose value on land_use_mask is 0 #3

Closed gdjmck closed 8 months ago

gdjmck commented 10 months ago

In agent.py module, after a action is sampled from _selectaction() method, the action was passed to the env.step method. Inside env.step() method, _self._current_land_usemask object is used to filter unreasonable positions by _self._current_land_usemask[action] == 0, and if it was 0 and throws a InfeasibleActionError.

My question is there is no guarentee that the action would not fall into _self._current_land_usemask with values of 0, and the training would stop for this exception. Was it a problem or should I just not raise the exception and replace it with a random choice of valid position in _self._current_land_usemask?

DavyMorgan commented 10 months ago

Thanks for your question. In the observation, there is an action mask indicating which action is feasible. And when sampling actions from the policy network, we set the selection probability according to the action mask. Please check these two lines of codes: land use road

Therefore, the policy network will actually never output infeasible actions. Here the InfeasibleActionError is left for debugging.