N_TILINGS with getQ and getQb

stephenkgu commented 4 years ago

With the last for loop in getQ and getQb of agent.cpp, i is from N_TILINGS to 3N_TILINGS, I think it should be from 2N_TILINGS to 3*N_TILINGS.

Is it a bug or I just miss something?

Thanks. @tspooner

tspooner commented 4 years ago

Hey,

You're absolutely right. I'm not sure why, but this seems to be yet another discrepancies between my private development repo and this public one - as in your issue from before (#9).

Rather than do a small patch like last time, I will go through all the code this weekend and make sure everything is in sync. Thanks for bringing this to my attention and sorry for the hassle.

Regards, Tom

stephenkgu commented 4 years ago

Thanks, I tried multithreaded training, but I failed, the data in agent just crashed.

Is this multithreading feature workable? @tspooner

tspooner commented 4 years ago

Honestly, I stopped using multi-threaded training quite some time before the main results of the paper were found. It doesn't surprise me much that it is broken.

I realise that's not ideal but you will probably have to do some modification yourself. This would include using mutexes to ensure that write-access is unique at any one time etc.

stephenkgu commented 4 years ago

I noted on-policy R learning seems perform better than the other counterparts in the paper, so is OnlineRLearn the on-policy R learning?

Which learning method do you recommend with market making? @tspooner

Thanks

tspooner commented 4 years ago

Yeah, OnlineRLearn is the on-policy R-learning algorithm that was introduced by Sutton. It's the equivalent of Q-learning for continuing tasks - i.e. it solves for a different objective: the expected average return as opposed to the expected discounted return.

In general, there is no "right" algorithm for market making. It really depends on what type of solutions you want and what assumptions you want to make about the setting; one can formulate market making as an episodic task (i.e. day-to-day trading), or as one with no terminal time. These yield different results, but it's not clear if one is necessarily better than the other. We certainly found in our experiments that it performed well, but that could also be said for Expected SARSA.

It also depends on whether you intend to use a discretised action-space or not...

Given all this, I would strongly suggest you start with Q-learning and SARSA and branch out from there. Until you try it for yourself and what the results are, where the limitations of on- vs off-policy methods lie etc, it's hard to gain proper insight. This makes for more effective solution development, in my experience.

stephenkgu commented 4 years ago

I tried on-policy R learning with a dataset, the resulting Pnl curve just goes downwards, which is unprofitable. I guess that on-policy methods just learn a near optimal policy, which is unprofitable.

Well, the variance seems really small, which is nice for market making, still unprofitable makes it unusable for market making, ;P

@tspooner

tspooner / rl_markets

N_TILINGS with getQ and getQb #10