Open sjn4048 opened 3 years ago
After the agent really equips something, this will go back normal again.
A guess: ironaxe is very rare, as ObtainDiamond's goal is to get ironpickaxe, instead of iron_axe. The only relation I can think of is that TreeChopV0 env starts with some type of axe (maybe iron_axe), could this be the reason?
Here is a first inventory observation from first step in the IronPickaxe dataset (should be the same as in the Diamond):
deobfuscated:
{'equipped_items': {'mainhand': {'damage': 600, 'maxDamage': -1, 'type': 'none'}}}
original vector:
[ 5.9359200e-02 3.6130545e-01 2.2302108e-02 4.6590340e-01
8.3399922e-01 -5.4766589e-01 -5.3313220e-01 1.4517257e-01
-1.6225849e-01 4.4127157e-01 9.3959264e-02 1.4446159e-01
5.3066909e-02 3.8569283e-01 4.6051107e-02 -4.1379360e-03
-3.1055182e-01 3.8475084e-01 -3.4949893e-01 3.3083931e-01
-4.5024895e-04 -1.9205706e-01 -4.0137717e-01 -2.0453778e-01
-4.3483767e-01 -8.3967346e-01 -2.7074900e-01 -6.7882514e-01
3.3821529e-01 -1.8499134e-02 4.7664848e-01 3.9689693e-01
2.8129154e-01 5.6583375e-01 1.0081351e-01 -4.1974071e-02
2.6246247e-01 1.9796182e-01 4.2013833e-01 -5.0612593e-01
1.8495357e-01 3.6445554e-02 1.3387953e-01 -2.5214735e-01
-3.2986674e-01 -4.4088489e-01 4.1660208e-01 -4.9444938e-01
4.2181006e-01 5.5862355e-01 5.3276277e-01 5.2860838e-01
4.9815449e-01 1.8754661e-01 -1.5164745e-01 2.1409102e-01
-3.0231151e-01 2.0704961e-01 -4.4902291e-02 -2.9338261e-01
-7.0547201e-02 -3.5819867e-01 -1.5438145e-01 -4.4374523e-01]
Here is the first inventory observation from the real env:
deobfuscated:
{'equipped_items': {'mainhand': {'damage': 600, 'maxDamage': -1, 'type': 'iron_axe'}}}
original vector:
[ 0.03574965 -0.02991187 0.05493971 0.31410468 0.48443056 -0.26832578
-0.38980542 0.15328171 -0.04445132 0.07340562 0.23895693 -0.11790033
0.17289005 0.07065457 0.02322971 0.02811745 -0.27927029 0.1889336
-0.09328388 0.23966521 0.15927544 -0.21311534 -0.4699087 -0.06114146
-0.24593872 -0.58976279 -0.10949771 -0.37964217 0.24832132 0.10542898
0.23829751 0.11381538 -0.00913478 0.23990227 0.01890011 -0.02672809
0.22047301 0.24523972 0.25394048 -0.20668896 0.16647686 -0.00384989
-0.05751187 -0.15477959 -0.28539445 -0.14074411 0.28814096 -0.24901295
0.25977528 0.41597704 0.30832001 0.38446612 0.34497863 0.1987621
-0.16191061 0.09357601 -0.23916132 0.11681672 0.20887107 -0.15966551
-0.03560115 -0.1346689 0.00736491 -0.33761163]
Cheers you both! Indeed this is happening and have observed the same. The issue is not in env spec itself (agent does not really have iron axe in non-obf env, it is not listed in malmo scenario file either) but seems to come up from obfuscation/vectorization. Digging further.
Can confirm, the non obfuscated env is working as expected for me.
Progress: seems like when the hand is empty (should be "none" or "air"), obfuscated environments say "iron_axe". Steps:
1) Started with iron_axe
as "mainhand" item.
2) Digged some dirt, now mainhand item was other
(dirt was in hand, probably).
3) Acted equip: air
4) iron_axe
showed up as the mainhand item again.
I ran out of time to work on this, but I think the error is in obfuscation itself (not vectorization). Wrapping and unwrapping observations with Vectorized
functions alone works correctly. @brandonhoughton or @shwang : would you have any insights on this?
Curious note: without obfuscation, default item in hand is "air", but it seems that in the dataset they are "none". Another note is that iron_axe
appears second in the Enum list of equipped items. Possible connection?
I don't recall the precise meaning of "air" and "none" in the dataset and the action space. I think that "none" means to leave the mainhand unmodified, and "air" means explicitly equip nothing.
A likely suspect for this change is https://github.com/minerllabs/minerl/commit/1a7ea8b0717f39a2618a68e9c20b6b8c6f261f79, where we did a significant refactor of the item equip code (possibly reordering the item enum relative to the cached obfuscation). If this is indeed the reason for failure, then apologies for that. I added many unit test cases but didn't think of the obfuscation.
Refactor was introduced in 1a7ea8b0717f39a2618a6. (Expect iron_axe
)
The last commit before the refactor was 8cc204360fdae39d18808facb21873baaed98b1c. (Expect air
).
minerl
doesn't run on my laptop right now (some C headers broken), so I'd appreciate it if someone ran the bug replication script on both these commits!
Hmm this to me seems like the deobfuscation network is a little confused and makes a mistake no?
Thanks for the input! Sadly it was not the refactoring Steven suggested (tried out the two commits, both give "iron_axe") :(.
I also tested with 0.3.7 obfuscation, and there things work out as expected (mainhand item is "none").
Thanks for checking the refactor! git bisect
could be helpful here for pinning down the commit that caused this bug (set https://github.com/minerllabs/minerl/commit/8cc204360fdae39d18808facb21873baaed98b1c "bad" and v3.4.2 "good") if someone has time before my new computer arrives.
Did bit of semi-manual spelunkin (had to patch build files etc), and it seems the error persists all the way back to https://github.com/minerllabs/minerl/commit/94a7ee35b7211bd9383a5d93e2e8aafad0f211a7, and beyond that building/running fails to random, unrelated errors. This is the time where main devs were having fun bringing Malmo to MineRL and such, judging by the commits. 0.3.6/0.3.7 versions work as expected.
Oh, good to know. Sorry you had to do manual patching along the way
@Miffyli @shwang Just wanna checkin whether it's gonna be fixed before the round 2 deadline or whether we have to count with it in our codes?
Just wanna checkin whether it's gonna be fixed before the round 2 deadline or whether we have to count with it in our codes?
We will work on getting it fixed for Round 2 but can not give 100% guarantee we will succeed (other competition deadlines occupying us).
@Miffyli Ok, we'll implement some workarounds for this issue. Would we have an option to select the bugged version for our submission if it gets fixed before the deadline?
If this bug is fixed we will apply it on all submissions. There is little time to setup a system that allows picking the MineRL version.
After a nice, comfortable and simple evening with my good friends, MineRL and pdb, I have figured out the puzzle. We have two creatures in play here:
1) In 0.3.7, the default equipped item was "none" (here). In 0.4.x, this was changed to "air". This makes sense, but was a breaking change since obfuscations were still in 0.3.7 format (game starts with "air" equipped, but dataset has "none") 2) The obfuscator does not decode encoding for "air" (as an equipped item) correctly. It instead decodes it to "iron_axe". This happens on both 0.3.7 and 0.4.x. I assume this is because it was not trained much with "air" data (?).
Proper fix for this is to re-compile the obfuscated datasets to use new obfuscated vectors.
A hotpatch to match dataset's obfuscations can be done by changing the default equipped item back to "none" by replacing this line with
handlers.EquippedItemObservation(items=[
none, 'air', 'wooden_axe', 'wooden_pickaxe', 'stone_axe', 'stone_pickaxe', 'iron_axe', 'iron_pickaxe'
# TODO (R): REMOVE NONE FOR MINERL-v1
other
], _default='none', _other=other),
After this the starting obfuscated vector matches ones in the dataset. Note that I have not tested this hotpatch properly and it might break some other features (it changes "none" to be first item in the list)!
@Miffyli Planned to start a new issue but then I found that the right place to ask is here. So here's my question: Why do we need to distinguish "none" and "air"? For me they share the same syntax meaning.
@sjn4048:
"none" is used as a no-op equip action, when the player in our dataset doesn't change the item currently in hand. "air" is used when the player changes the in-hand item.
https://minerl.readthedocs.io/en/latest/environments/handlers.html#tool-control-equip-and-use
I believe this could be the reason for 0.3/0.4 obf inconsistency.
How to reproduce: