lucidrains / routing-transformer

Fully featured implementation of Routing Transformer
MIT License
282 stars 29 forks source link

Missing key(s) in state_dict #6

Closed epetros closed 4 years ago

epetros commented 4 years ago

Greetings, Previously was able to save and load checkpoints, but today I get: RuntimeError: Error(s) in loading state_dict for AutoregressiveWrapper: Missing key(s) in state_dict: "net.net.routing_transformer.layers.blocks.0.f.net.fn.local_attn.rel_pos.weights", "net.net.routing_transformer.layers.blocks.1.f.net.fn.local_attn.rel_pos.weights", "net.net.routing_transformer.layers.blocks.2.f.net.fn.local_attn.rel_pos.weights", "net.net.routing_transformer.layers.blocks.3.f.net.fn.local_attn.rel_pos.weights", "net.net.routing_transformer.layers.blocks.4.f.net.fn.local_attn.rel_pos.weights", "net.net.routing_transformer.layers.blocks.5.f.net.fn.local_attn.rel_pos.weights", "net.net.routing_transformer.layers.blocks.6.f.net.fn.local_attn.rel_pos.weights", "net.net.routing_transformer.layers.blocks.7.f.net.fn.local_attn.rel_pos.weights".

Help please, Thanks

lucidrains commented 4 years ago

@epetros oops, it's been fixed in the latest my bad https://github.com/lucidrains/routing-transformer/commit/f6ce3bbcd07add17e9bb0be281850ee8a6045646 how is Routing Transformer working for you? well??

epetros commented 4 years ago

Thank you, but I still face the same issue, works only with v0.8.4. Am using enwik8_deepspeed with my own dataset, seems to be very fast and loss is decreasing but guess it's still early (loss ~0.9) to get nice generation results.

lucidrains commented 4 years ago

@epetros fixed for real now lol https://github.com/lucidrains/routing-transformer/commit/1341f6987028f5bebca009ee4925baffd0572653

epetros commented 4 years ago

It's fixed, thanks.