Closed gawainsss closed 1 year ago
@gawainsss Thanks for the issue. I believe this is a general RL question, not specific to d3rlpy. I removed bug
label since this is not a bug report.
From what you mentioned, there are two cases.
Once you train an agent, the agent is capable of making decisions only with the feature order used in training.
Deep learning training has stochasticity (strictly speaking, it can be deterministic, but shaffling features make difference). Thus you might get agents with different behavior every time you train a new agent.
I would suggest you could learn more about neural networks before starting more experiments since this is a very elementary question.
Let me close this issue since this is not about d3rlpy.
@gawainsss Thanks for the issue. I believe this is a general RL question, not specific to d3rlpy. I removed
bug
label since this is not a bug report.From what you mentioned, there are two cases.
- Train an agent and re-order features at test-time.
Once you train an agent, the agent is capable of making decisions only with the feature order used in training.
- An agent behaves differently from the other agent trained with different order of features.
Deep learning training has stochasticity (strictly speaking, it can be deterministic, but shaffling features make difference). Thus you might get agents with different behavior every time you train a new agent.
I would suggest you could learn more about neural networks before starting more experiments since this is a very elementary question.
Thanks for your answer! Sorry I made a mistake with the label T T
Hello everyone,
I'm working on a debt collection problem using Reinforcement Learning where I aim to determine the optimal collection strategy, including frequency, intensity, and collection methods, to maximize the repayment amount from delinquent clients. I've modelled this as a Markov Decision Process (MDP) based on different types of debts.
Specifically, the state space is represented as a four-dimensional vector corresponding to different debt types: [Credit Card, Micro-loans, Personal Loans, Mortgage]. A sample state like [1,0,0,1] denotes that the repayment percentage for credit card and mortgage is 0%, while there's no overdue for micro-loans and personal loans.
However, I've noticed a peculiar behavior: When I reshuffle the order of the debt types in the state representation, say moving 'Mortgage' to the second position leading to a state like [1,1,0,0], the RL results differ.
Why does changing the order of dimensions/state representation lead to different outcomes? Is there an underlying assumption or behavior in the algorithm that is sensitive to the ordering of state features?
Any insights or advice would be greatly appreciated. Thank you!