takuseno / d3rlpy

An offline deep reinforcement learning library
https://takuseno.github.io/d3rlpy
MIT License
1.33k stars 244 forks source link

Feature Order Leads to Different Results #345

Closed gawainsss closed 1 year ago

gawainsss commented 1 year ago

Hello everyone,

I'm working on a debt collection problem using Reinforcement Learning where I aim to determine the optimal collection strategy, including frequency, intensity, and collection methods, to maximize the repayment amount from delinquent clients. I've modelled this as a Markov Decision Process (MDP) based on different types of debts.

Specifically, the state space is represented as a four-dimensional vector corresponding to different debt types: [Credit Card, Micro-loans, Personal Loans, Mortgage]. A sample state like [1,0,0,1] denotes that the repayment percentage for credit card and mortgage is 0%, while there's no overdue for micro-loans and personal loans.

However, I've noticed a peculiar behavior: When I reshuffle the order of the debt types in the state representation, say moving 'Mortgage' to the second position leading to a state like [1,1,0,0], the RL results differ.

Why does changing the order of dimensions/state representation lead to different outcomes? Is there an underlying assumption or behavior in the algorithm that is sensitive to the ordering of state features?

Any insights or advice would be greatly appreciated. Thank you!

takuseno commented 1 year ago

@gawainsss Thanks for the issue. I believe this is a general RL question, not specific to d3rlpy. I removed bug label since this is not a bug report.

From what you mentioned, there are two cases.

  1. Train an agent and re-order features at test-time.

Once you train an agent, the agent is capable of making decisions only with the feature order used in training.

  1. An agent behaves differently from the other agent trained with different order of features.

Deep learning training has stochasticity (strictly speaking, it can be deterministic, but shaffling features make difference). Thus you might get agents with different behavior every time you train a new agent.

I would suggest you could learn more about neural networks before starting more experiments since this is a very elementary question.

takuseno commented 1 year ago

Let me close this issue since this is not about d3rlpy.

gawainsss commented 1 year ago

@gawainsss Thanks for the issue. I believe this is a general RL question, not specific to d3rlpy. I removed bug label since this is not a bug report.

From what you mentioned, there are two cases.

  1. Train an agent and re-order features at test-time.

Once you train an agent, the agent is capable of making decisions only with the feature order used in training.

  1. An agent behaves differently from the other agent trained with different order of features.

Deep learning training has stochasticity (strictly speaking, it can be deterministic, but shaffling features make difference). Thus you might get agents with different behavior every time you train a new agent.

I would suggest you could learn more about neural networks before starting more experiments since this is a very elementary question.

Thanks for your answer! Sorry I made a mistake with the label T T