takuseno / d3rlpy

An offline deep reinforcement learning library
https://takuseno.github.io/d3rlpy
MIT License
1.29k stars 230 forks source link

FQE support .d3 file model? #381

Open CastleImitation opened 6 months ago

CastleImitation commented 6 months ago

Dear Sir, Is our FQE evaluation method support .d3 file model? When I'm going to run my trained DDQN .d3 file model on FQE, an error occurred saying "RuntimeError: Invalid magic number; corrupt file?" It seems that FQE does not support .d3 file model. Or, should I save the model to .pt file?

Looking forward your kind reply!

takuseno commented 6 months ago

@CastleImitation Hi, thanks for the issue. FQE does support .d3 file as well. Could you share the minimal code that I can reproduce your issue?

CastleImitation commented 6 months ago

Dear Takuseno, Many thanks for you kind reply.

Since I'm not sure whether FQE can support d3. I have save my model with .pt. by fqe.save_model(f"{label}fqe.pt") after read the official documents from https://d3rlpy.readthedocs.io/en/v2.3.0.

haha

CastleImitation commented 6 months ago

Dear Takuseno

Thanks for your kind reply.

I have checked the official document of d3rlpy and found the way to save the model as .pt.

Regards!

Felix Li

@. | ---- Replied Message ---- | From | Takuma @.> | | Date | 3/3/2024 19:42 | | To | @.> | | Cc | @.>, @.***> | | Subject | Re: [takuseno/d3rlpy] FQE support .d3 file model? (Issue #381) |

@CastleImitation Hi, thanks for the issue. FQE does support .d3 file as well. Could you share the minimal code that I can reproduce your issue?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

CastleImitation commented 6 months ago

Dear Takuseno,

By the way, could I have one additional question?

I have trained my model. But when I tried to load my model and build the test dataset for the model, it reported error :

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

RuntimeError: Error(s) in loading state_dict for ModuleList: size mismatch for 0._fc.weight: copying a param with shape torch.Size([343, 256]) from checkpoint, the shape in current model is torch.Size([340, 256]). size mismatch for 0._fc.bias: copying a param with shape torch.Size([343]) from checkpoint, the shape in current model is torch.Size([340]).

I also tried to build the trained dataset for the trained model, but it didn't report any error.

Here is my code

config_BCQ = DiscreteBCQConfig() config_SAC = DiscreteSACConfig() config_DDQN = DoubleDQNConfig()

model_BCQ = DiscreteBCQ(algo=self.policy,config=fqe_config,device="cuda:0")

device = "cuda"

model_BCQ = config_BCQ.create(device=device) model_SAC = config_SAC.create(device=device) model_DDQN = config_DDQN.create(device=device)

POLICIES = { 'BCQ':[ [f'{FINAL_POLICIES_PATH}/raw_intermediate/BCQ/run_0/DiscreteBCQ_train.pt',model_BCQ], [f'{FINAL_POLICIES_PATH}/raw_intermediate/BCQ/run_1/DiscreteBCQ_train.pt',model_BCQ], [f'{FINAL_POLICIES_PATH}/raw_intermediate/BCQ/run_2/DiscreteBCQ_train.pt',model_BCQ], [f'{FINAL_POLICIES_PATH}/raw_intermediate/BCQ/run_3/DiscreteBCQ_train.pt',model_BCQ], [f'{FINAL_POLICIES_PATH}/raw_intermediate/BCQ/run_4/DiscreteBCQ_train.pt',model_BCQ] ], 'SAC': [ [f'{FINAL_POLICIES_PATH}/raw_intermediate/SAC/run_0/DiscreteSAC_train.pt', model_SAC], [f'{FINAL_POLICIES_PATH}/raw_intermediate/SAC/run_1/DiscreteSAC_train.pt', model_SAC], [f'{FINAL_POLICIES_PATH}/raw_intermediate/SAC/run_2/DiscreteSAC_train.pt', model_SAC], [f'{FINAL_POLICIES_PATH}/raw_intermediate/SAC/run_3/DiscreteSAC_train.pt', model_SAC], [f'{FINAL_POLICIES_PATH}/raw_intermediate/SAC/run_4/DiscreteSAC_train.pt', model_SAC] ], 'DDQN': [ [f'{FINAL_POLICIES_PATH}/raw_intermediate/DQN/run_0/DoubleDQN_train.pt', model_DDQN], [f'{FINAL_POLICIES_PATH}/raw_intermediate/DQN/run_1/DoubleDQN_train.pt', model_DDQN], [f'{FINAL_POLICIES_PATH}/raw_intermediate/DQN/run_2/DoubleDQN_train.pt', model_DDQN], [f'{FINAL_POLICIES_PATH}/raw_intermediate/DQN/run_3/DoubleDQN_train.pt', model_DDQN], [f'{FINAL_POLICIES_PATH}/raw_intermediate/DQN/run_3/DoubleDQN_train.pt', model_DDQN] ] }

for P in POLICIES.keys(): # for i in range(len(POLICIES[P])): # 5个 data = load_data(states='raw', rewards='intermediate',index_of_split=i)[1] # 测试集数据 循环到最后一个算法的最后一轮时,data是最后一个数据集的数据 POLICIES[P][i][1].build_with_dataset(data) # 把类添加到字典的时候就已经实例化了,即便没有显式地赋予一个变量名 POLICIES[P][i][1].load_model(POLICIES[P][i][0])

data = load_data(states='raw', rewards='intermediate',index_of_split=i)[1] here is my dataset. [1] indicates the test data. In my experiment, there are totally 777=343 action possibilities.

By the way, since you have willingly helped me a lot, would you mind if I add your name in the authors when I issue my paper?

Awaiting your kind reply.

Felix Li

@.*** |