Open HowardZJU opened 2 years ago
Hi, thanks for your attention on our work.
Thanks for your answering! The solution can be got from this MRDR paper in ICML 2018 in the area of RL off-line evaluation: http://proceedings.mlr.press/v80/farajtabar18a/farajtabar18a.pdf#:~:text=The%20main%20idea%20of%20our%20estimator%2C%20called%20more,samples%20are%20randomly%20missing%20%28Cao%20et%20al.%2C%202009%29, with the same name, mathematical formula with this paper.. My mentee pointed it out and maybe cite it in Sec 3.2 formally will avoid misunderstanding.
Hi, I am an engineer in Alibaba, and these days we are reproducing the MRDR-DL paper in our deployment environment, while the results are not competing. We read and discuss this paper carefully and have some questions as follows:
Overall this is an abstractive paper, we just want to make clear of the details and explore its application in industrial scenario.