hpi-epic / BP2021

Working repository in context of the bachelor project "Online Marketplace Simulation: A Testbed for Self-Learning Agents" at the research group Enterprise Platform and Integration Concepts
MIT License
3 stars 0 forks source link

market state during simulation step leaks previous vendor actions #543

Closed davidhennemann closed 1 year ago

davidhennemann commented 1 year ago

assume a linear market with two vendors (doupoly, RL vs rule-based). During the reset() the vendor_actions of the market will default to:

$$[a_0^{t=0}, a1^{t=0}]=[prod{price} + 1, prod_{price} + 1]$$

In the linear market the state simply holds the qualities of both vendors $[x,y]$.

Now in the first episode and first step of the simulation the rl-agent will receive his observation of the market, i.e. $[x, a_1^{t=0}, y]$. The agent picks an action $a_0^{t=1}$ according to his policy. Now the market will be simulation. As excepted the customers will be split it into groups for each vendor:

customers_per_vendor_iteration = self.config.number_of_customers // self._number_of_vendors.

At first the probability distribution, which defines the purchase behaviour will be generated with prices/actions= $[a_0^{t=1}, a_1^{t=0}]$ and qualities $[x,y]$. This iteration simulated the effects of the action of the rl-agent.

In the second iteration the rule-based agent can choose his action. For doing so one should expect him to get an observation $[y, a_0^{t=0}, x]$. But he actually receives the observation $[y, a_0^{t=1}, x]$.

customers_per_vendor_iteration = self.config.number_of_customers // self._number_of_vendors
        for i in range(self._number_of_vendors):
            self._simulate_customers(profits, customers_per_vendor_iteration)
            if i < len(self.competitors):
                action_competitor_i = self.competitors[i].policy(self._observation(i + 1)) # this observation already leaks information
                self.vendor_actions[i + 1] = action_competitor_i # during the next iteration we know would simulation customers behaviour a second time with the action from vendor 0...
davidhennemann commented 1 year ago

Ok this can be closed. It seems to be the desired behaviour! I just misunderstood it.