minerllabs / minerl

MineRL Competition for Sample Efficient Reinforcement Learning - Python Package
http://minerl.io/docs/
Other
706 stars 154 forks source link

Starter-mainhand-weapon is iron_axe (instead of none or air) in ObtainDiamondVectorObf env. (only in v0.4.x) #597

Open sjn4048 opened 3 years ago

sjn4048 commented 3 years ago

I believe this could be the reason for 0.3/0.4 obf inconsistency.

How to reproduce:

from minerl.herobraine.wrappers.obfuscation_wrapper import Obfuscated
from minerl.herobraine.wrappers.vector_wrapper import Vectorized
from minerl.herobraine.envs import MINERL_OBTAIN_DIAMOND_V0, comp_envs
from collections import OrderedDict

import numpy as np
import gym
import minerl

class DeObf:
    def __init__(self, obf_path, wrap_env=MINERL_OBTAIN_DIAMOND_V0):
        self._vec = Vectorized(wrap_env, common_envs=comp_envs)
        self._obf_vector_len, self._ac_enc, self._ac_dec, self._obs_enc, self._obs_dec = \
            Obfuscated._get_obfuscator(obf_path)

    def decode_obs(self, vector: OrderedDict) -> OrderedDict:
        ret = vector.copy()
        ret['vector'] = self._obs_dec(vector['vector'])  #obf
        return self._vec.unwrap_observation(ret)  #vector

env = gym.make('MineRLObtainDiamondDenseVectorObf-v0')
obs = env.reset()

v3_path = 'PUT OBF WEIGHTS FOLDER HERE (such as xxx/v3)'
t = DeObf(v3_path)

print(t.decode_obs(obs)['equipped_items']['mainhand']['type'])
sjn4048 commented 3 years ago

After the agent really equips something, this will go back normal again.

A guess: ironaxe is very rare, as ObtainDiamond's goal is to get ironpickaxe, instead of iron_axe. The only relation I can think of is that TreeChopV0 env starts with some type of axe (maybe iron_axe), could this be the reason?

gamecraftCZ commented 3 years ago

Here is a first inventory observation from first step in the IronPickaxe dataset (should be the same as in the Diamond):

deobfuscated:
{'equipped_items': {'mainhand': {'damage': 600, 'maxDamage': -1, 'type': 'none'}}}
original vector:
[ 5.9359200e-02  3.6130545e-01  2.2302108e-02  4.6590340e-01
  8.3399922e-01 -5.4766589e-01 -5.3313220e-01  1.4517257e-01
 -1.6225849e-01  4.4127157e-01  9.3959264e-02  1.4446159e-01
  5.3066909e-02  3.8569283e-01  4.6051107e-02 -4.1379360e-03
 -3.1055182e-01  3.8475084e-01 -3.4949893e-01  3.3083931e-01
 -4.5024895e-04 -1.9205706e-01 -4.0137717e-01 -2.0453778e-01
 -4.3483767e-01 -8.3967346e-01 -2.7074900e-01 -6.7882514e-01
  3.3821529e-01 -1.8499134e-02  4.7664848e-01  3.9689693e-01
  2.8129154e-01  5.6583375e-01  1.0081351e-01 -4.1974071e-02
  2.6246247e-01  1.9796182e-01  4.2013833e-01 -5.0612593e-01
  1.8495357e-01  3.6445554e-02  1.3387953e-01 -2.5214735e-01
 -3.2986674e-01 -4.4088489e-01  4.1660208e-01 -4.9444938e-01
  4.2181006e-01  5.5862355e-01  5.3276277e-01  5.2860838e-01
  4.9815449e-01  1.8754661e-01 -1.5164745e-01  2.1409102e-01
 -3.0231151e-01  2.0704961e-01 -4.4902291e-02 -2.9338261e-01
 -7.0547201e-02 -3.5819867e-01 -1.5438145e-01 -4.4374523e-01]

Here is the first inventory observation from the real env:

deobfuscated:
{'equipped_items': {'mainhand': {'damage': 600, 'maxDamage': -1, 'type': 'iron_axe'}}}
original vector:
[ 0.03574965 -0.02991187  0.05493971  0.31410468  0.48443056 -0.26832578
 -0.38980542  0.15328171 -0.04445132  0.07340562  0.23895693 -0.11790033
  0.17289005  0.07065457  0.02322971  0.02811745 -0.27927029  0.1889336
 -0.09328388  0.23966521  0.15927544 -0.21311534 -0.4699087  -0.06114146
 -0.24593872 -0.58976279 -0.10949771 -0.37964217  0.24832132  0.10542898
  0.23829751  0.11381538 -0.00913478  0.23990227  0.01890011 -0.02672809
  0.22047301  0.24523972  0.25394048 -0.20668896  0.16647686 -0.00384989
 -0.05751187 -0.15477959 -0.28539445 -0.14074411  0.28814096 -0.24901295
  0.25977528  0.41597704  0.30832001  0.38446612  0.34497863  0.1987621
 -0.16191061  0.09357601 -0.23916132  0.11681672  0.20887107 -0.15966551
 -0.03560115 -0.1346689   0.00736491 -0.33761163]
Miffyli commented 3 years ago

Cheers you both! Indeed this is happening and have observed the same. The issue is not in env spec itself (agent does not really have iron axe in non-obf env, it is not listed in malmo scenario file either) but seems to come up from obfuscation/vectorization. Digging further.

gamecraftCZ commented 3 years ago

Can confirm, the non obfuscated env is working as expected for me.

Miffyli commented 3 years ago

Progress: seems like when the hand is empty (should be "none" or "air"), obfuscated environments say "iron_axe". Steps:

1) Started with iron_axe as "mainhand" item. 2) Digged some dirt, now mainhand item was other (dirt was in hand, probably). 3) Acted equip: air 4) iron_axe showed up as the mainhand item again.

Miffyli commented 3 years ago

I ran out of time to work on this, but I think the error is in obfuscation itself (not vectorization). Wrapping and unwrapping observations with Vectorized functions alone works correctly. @brandonhoughton or @shwang : would you have any insights on this?

Curious note: without obfuscation, default item in hand is "air", but it seems that in the dataset they are "none". Another note is that iron_axe appears second in the Enum list of equipped items. Possible connection?

shwang commented 3 years ago

I don't recall the precise meaning of "air" and "none" in the dataset and the action space. I think that "none" means to leave the mainhand unmodified, and "air" means explicitly equip nothing.

A likely suspect for this change is https://github.com/minerllabs/minerl/commit/1a7ea8b0717f39a2618a68e9c20b6b8c6f261f79, where we did a significant refactor of the item equip code (possibly reordering the item enum relative to the cached obfuscation). If this is indeed the reason for failure, then apologies for that. I added many unit test cases but didn't think of the obfuscation.

Testing

Refactor was introduced in 1a7ea8b0717f39a2618a6. (Expect iron_axe) The last commit before the refactor was 8cc204360fdae39d18808facb21873baaed98b1c. (Expect air).

minerl doesn't run on my laptop right now (some C headers broken), so I'd appreciate it if someone ran the bug replication script on both these commits!

brandonhoughton commented 3 years ago

Hmm this to me seems like the deobfuscation network is a little confused and makes a mistake no?

Miffyli commented 3 years ago

Thanks for the input! Sadly it was not the refactoring Steven suggested (tried out the two commits, both give "iron_axe") :(.

I also tested with 0.3.7 obfuscation, and there things work out as expected (mainhand item is "none").

shwang commented 3 years ago

Thanks for checking the refactor! git bisect could be helpful here for pinning down the commit that caused this bug (set https://github.com/minerllabs/minerl/commit/8cc204360fdae39d18808facb21873baaed98b1c "bad" and v3.4.2 "good") if someone has time before my new computer arrives.

Miffyli commented 3 years ago

Did bit of semi-manual spelunkin (had to patch build files etc), and it seems the error persists all the way back to https://github.com/minerllabs/minerl/commit/94a7ee35b7211bd9383a5d93e2e8aafad0f211a7, and beyond that building/running fails to random, unrelated errors. This is the time where main devs were having fun bringing Malmo to MineRL and such, judging by the commits. 0.3.6/0.3.7 versions work as expected.

shwang commented 3 years ago

Oh, good to know. Sorry you had to do manual patching along the way

gamecraftCZ commented 3 years ago

@Miffyli @shwang Just wanna checkin whether it's gonna be fixed before the round 2 deadline or whether we have to count with it in our codes?

Miffyli commented 3 years ago

Just wanna checkin whether it's gonna be fixed before the round 2 deadline or whether we have to count with it in our codes?

We will work on getting it fixed for Round 2 but can not give 100% guarantee we will succeed (other competition deadlines occupying us).

gamecraftCZ commented 3 years ago

@Miffyli Ok, we'll implement some workarounds for this issue. Would we have an option to select the bugged version for our submission if it gets fixed before the deadline?

Miffyli commented 3 years ago

If this bug is fixed we will apply it on all submissions. There is little time to setup a system that allows picking the MineRL version.

Miffyli commented 3 years ago

After a nice, comfortable and simple evening with my good friends, MineRL and pdb, I have figured out the puzzle. We have two creatures in play here:

1) In 0.3.7, the default equipped item was "none" (here). In 0.4.x, this was changed to "air". This makes sense, but was a breaking change since obfuscations were still in 0.3.7 format (game starts with "air" equipped, but dataset has "none") 2) The obfuscator does not decode encoding for "air" (as an equipped item) correctly. It instead decodes it to "iron_axe". This happens on both 0.3.7 and 0.4.x. I assume this is because it was not trained much with "air" data (?).

Proper fix for this is to re-compile the obfuscated datasets to use new obfuscated vectors.

A hotpatch to match dataset's obfuscations can be done by changing the default equipped item back to "none" by replacing this line with

handlers.EquippedItemObservation(items=[
    none, 'air', 'wooden_axe', 'wooden_pickaxe', 'stone_axe', 'stone_pickaxe', 'iron_axe', 'iron_pickaxe'
    # TODO (R): REMOVE NONE FOR MINERL-v1
    other
], _default='none', _other=other),

After this the starting obfuscated vector matches ones in the dataset. Note that I have not tested this hotpatch properly and it might break some other features (it changes "none" to be first item in the list)!

sjn4048 commented 3 years ago

@Miffyli Planned to start a new issue but then I found that the right place to ask is here. So here's my question: Why do we need to distinguish "none" and "air"? For me they share the same syntax meaning.

shwang commented 3 years ago

@sjn4048:

"none" is used as a no-op equip action, when the player in our dataset doesn't change the item currently in hand. "air" is used when the player changes the in-hand item.

https://minerl.readthedocs.io/en/latest/environments/handlers.html#tool-control-equip-and-use