This pull request is the first stage of implementing the cost-constrained POMCP (CC-POMCP) algorithm. This algorithm cannot be added to the repository directly because it has additional variables and operations not present in the PO-UCT and POMCP algorithms. One example is cost constraints and their corresponding operations.
To accommodate costs and other future variables, I propose to use a generic model, called a Response model, and a corresponding output, called a response. The name "response" comes from the notion of independent and dependent variables, where a response (reward, cost, etc.) depends on the interaction with the real or simulated environment. Thus, a response model is a wrapper for more specific models, such as reward and cost models (and any others that will follow in the future). By extension, a response is a wrapper for the reward, cost, etc.
The pull request has the following:
Implementation of the ResponseModel and Response classes in the basics files,
Updates to classes in basics to use ResponseModel instead of RewardModel directly,
Updates to the appropriate pre-existing algorithms, including PO-UCT and POMCP,
Updates to the code for the tiger, rocksample, mos, load_unload, and tag problems,
Added test script for response arithmetic operations in test_response.py,
Passes for the test_all.py script,
Passes for the tiger, rocksample, mos, load_unload, and tag problems.
Comments to the ResponseModel and Response classes.
This pull request is the first stage of implementing the cost-constrained POMCP (CC-POMCP) algorithm. This algorithm cannot be added to the repository directly because it has additional variables and operations not present in the PO-UCT and POMCP algorithms. One example is cost constraints and their corresponding operations.
To accommodate costs and other future variables, I propose to use a generic model, called a Response model, and a corresponding output, called a response. The name "response" comes from the notion of independent and dependent variables, where a response (reward, cost, etc.) depends on the interaction with the real or simulated environment. Thus, a response model is a wrapper for more specific models, such as reward and cost models (and any others that will follow in the future). By extension, a response is a wrapper for the reward, cost, etc.
The pull request has the following:
ResponseModel
andResponse
classes in the basics files,ResponseModel
instead ofRewardModel
directly,tiger
,rocksample
,mos
,load_unload
, andtag
problems,test_response.py
,test_all.py
script,tiger
,rocksample
,mos
,load_unload
, andtag
problems.ResponseModel
andResponse
classes.