Stochasticity of Rewards

hpi-sam / rl-4-self-repair

Reinforcement Learning Models for Online Learning of Self-Repair and Self-Optimization

MIT License

0 stars 1 forks source link

Stochasticity of Rewards #9

Open christianadriano opened 4 years ago

christianadriano commented 4 years ago

@brrrachel Please confirm with Sona if we should build the distribution of rewards by component,failure pairs only, or we should add one more parameter (e.g., Shop).

christianadriano commented 4 years ago

I don't believe we need to rebuild these distributions by shop. Each shop has different parameters, but the utility_increase for a component_failure pair is computed using the same utility function.

brrrachel commented 4 years ago

@christianadriano I have already introduced selecting the component based on the id. @2start will investigate how this changes the results.

brrrachel commented 4 years ago

Using the id instead of the name increases the number of <component, failure pairs> from 72 to 1434. We have at total 10928 data points.

christianadriano commented 4 years ago

This means that we would probably have less than 10 rewards values (10928/1434) for each <component_id, failure> pair. ..