kakaoenterprise / JORLDY

Repository for Open Source Reinforcement Learning Framework JORLDY
Apache License 2.0
362 stars 49 forks source link

[#130]Debug/fix step calculation in async_distributed_train #134

Closed kan-s0 closed 2 years ago

kan-s0 commented 2 years ago

:star2: Hello! Thanks for contributing JORLDY!

Checklist

Please check if you consider the following items.

Types of changes

Please describe the types of changes! (ex. Bugfix, New feature, Documentation, ...) Bugfix

Test Configuration

Description

Please describe the details of your contribution

In each asynchronous loop, a different number of steps comes in each time, and now it is reset to "heap["print_stamp"] = 0 " as in the following code.

if heap["print_stamp"] >= config.train.print_period or is_over:
    print_signal = True
    heap["print_stamp"] = 0 # 50100.25->0, loss 100.25 step

...

if print_signal: # 49899.75 >= 50000 at last loop
    try:
        manage_sync_queue.get_nowait()
    except:
        pass
    manage_sync_queue.put(agent.sync_out()) # no execution in the last loop.

In fact, if it is updated at heap["print_stamp"]=50100 of 50000 period, it should be updated after an additional 49900 steps. Currently, while updating heap["print_stamp"]=0, the entire step is finished after the last additional 49900 steps, but a is not updated and the loop does not end.

So, fix it like this:

if heap["print_stamp"] >= config.train.print_period or is_over:
    print_signal = True
    heap["print_stamp"] -= config.train.print_period # fix