This PR adds proper backward sampling routines, specifically to sample from parameterized backward policies or to sample uniformly (from masks).
This PR also:
Changes how stem attachments are defined in the fragment environment using a saner "src"/"dst" format
Includes #96 to showcase offline training
adds an extra Queue object for in-main-process calls to an MP wrapped model
changes GraphActionCategorical._compute_batchwise_max to use finfo().min instead of the batch min to compute the argmax -- this avoids some edge case errors.
numpy-fies graph_to_Data in the fragment environment
Includes @diamondspark as an author since the PR is based on his implementation.
This PR adds proper backward sampling routines, specifically to sample from parameterized backward policies or to sample uniformly (from masks).
This PR also:
Queue
object for in-main-process calls to an MP wrapped modelGraphActionCategorical._compute_batchwise_max
to usefinfo().min
instead of the batch min to compute the argmax -- this avoids some edge case errors.graph_to_Data
in the fragment environmentIncludes @diamondspark as an author since the PR is based on his implementation.