Closed huangdi95 closed 10 months ago
Heya. This is unfortunately bit messy/confusing part of the code. Those reward handlers etc are indeed not implemented in the newest version of the MineRL (1.0.x), and are remnants of stuff that was left forking from earlier versions, which have full support for them. Your best bet is to not modify the MineRL environment but to add environment wrappers on top of it / write your own reward signals on top of the observations.
Thank you for your reply!
I see. I found that (1) the agent start handler works so I can customize the inventory of the agent while the reward and quit handler fails. (2) The reward handler is passed to malmo in XML like:
......
<AgentStart>
<LowLevelInputs>true</LowLevelInputs>
<GuiScale>1.0</GuiScale>
<GammaSetting>2.0</GammaSetting>
<FOVSetting>70.0</FOVSetting>
<FakeCursorSize>16</FakeCursorSize>
<Inventory>
<InventoryObject slot="0" type="dirt" quantity="1"/>
<InventoryObject slot="1" type="diamond_axe" quantity="1"/>
<InventoryObject slot="2" type="crafting_table" quantity="1"/>
<InventoryObject slot="3" type="stick" quantity="1"/>
<InventoryObject slot="4" type="oak_log" quantity="1"/>
</Inventory>
</AgentStart>
<AgentHandlers>
......
<!-- Rewards -->
<RewardForCraftingItem Sparse="true">
<Item reward="20" type="wooden_pickaxe" amount="1"/>
</RewardForCraftingItem>
<!-- Additional Agent Handlers like quitting -->
<AgentQuitFromPossessingItem>
<Item type="stone_pickaxe" amount="1"/>
</AgentQuitFromPossessingItem>
</AgentHandlers>
......
but nothing happens and I can't get the right reward. I'm confused about why the AgentStart
works but Reward
and Quit
fail (I have tried different types of handlers like RewardForPossessingItem
and RewardForCollectingItem
).
Basically, I want to finetune VPT base model with RL. Can you give me some suggestions on this? Can I solve this problem by downgrading minerl to v0.4.4 (can VPT run on v0.4.4)? Or should I just go to wrap my own reward functions?
The start inventory handler is probably supported in the code, so indeed it works, but not the reward handler (i.e., none of the reward handler specs specified by Malmö do not work, as v1.x does not use Malmö code at all).
I'd implement reward signals outside the MineRL code by observing how obs
or info
entries change (they contain lots of information about distance travelled and items obtained). If you want more refined control, however, check out MineDojo, which is based on MineRL v0.4.x version but has more control over things.
Got it. I'll have a try. Thank you!
Hi, I want to customize the reward function. I modified the
create_rewardables
method inHumanSurvival
like this:It is supposed to get
1.0
reward for each"log"
collected.I run the agent with the original code of VPT's run_agent.py. The model is
2x.model
and the weight isrl-from-foundation-2x.weights
downloaded from VPTHowever, the reward is always
0
even if the agent can collect tons oflog
s.I notice that in
_multiagent.py
where the environment is stepped, the reward is0
and it is said thatSo is the reward handler not supported yet? Then how should I customize the reward function?
Could someone help me with this?