softwarespartan / IB4m

Interactive Brokers API for Matlab
GNU General Public License v2.0
62 stars 21 forks source link

Order of data in buffets #139

Open giovannetti87 opened 3 years ago

giovannetti87 commented 3 years ago

Dear all,

Assuming a circular buffer of size 20, obtaining some market data, call it Buf. I would like to confirm the chronological structure of the data. Is it always the case that Buf(end) contains more recent (or simply, no older ) data than Buf(1) ? thanks,

Despair2000 commented 3 years ago

If you use reqMarketData you can be sure that they come in chronological order and these are FIFO buffers. So the first event is always the most recent and last the oldest. If this is also true when requesting historical data I'm not sure. Abel sorts the events in his example so he maybe has experienced that they were not in the right order.

giovannetti87 commented 3 years ago

Thanks Despair!

softwarespartan commented 3 years ago

yes, these are FIFO buffers. Again, keep in mind that you don't have to use buffers, per se. IB4m hands off the events to MATLAB in the exact same order as they are provided from TWS.

There are cases where you might want to be defensive. For example, historical data request is just good practice to sort the data (i.e. it can't hurt, but maybe not required)

If processing data in real-time, then might just keep track of the most-recent datetime. If any event has a datetime less than your latest datetime then could reject that datapoint.

Be sure to keep in mind that there are two times here. one is the event time provided by IB servers then the second time is the time associated by TWS for when the event is received and processed (does that make sense?). This is very clear for historical data, but can sometimes be hard to differentiate for real-time market data since those times are very close. Therefore, event.data.datetime is the datetime associated with the data, and event.datetime is the datetime associated with the event itself.

On Fri, Sep 3, 2021 at 5:50 AM giovannetti87 @.***> wrote:

Thanks Despair!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/softwarespartan/IB4m/issues/139#issuecomment-912411522, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABC2VVFEKEBYWHCDLRXQWELUACK5JANCNFSM5DGXXOHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

giovannetti87 commented 3 years ago

Hi @softwarespartan , thanks for the reply and the generous work on this great project. I spent some time reading your previous replies on the topic to some user who had a similar situation and I will likely remove buffers from my current script as it is entirely based upon real time data feed. Hopefully, this will also allow me to solve the problem of matlab freezing upon script termination (ctrl+C) when a large number of buffers is deployed. This behaviour is still puzzling me.

I would like to reiterate my gratitude toward you two guys for your support. I literally went from zero into having an almost-fully operative script with data collection and order making capabilities.

Despair2000 commented 3 years ago

The gratitude that I share deserves only Abel. I'm just a user like yourself.

How many is "a large number of buffers"? I've been using 80 and more simultaneously with out problems.

giovannetti87 commented 3 years ago

@Despair2000 Hi Despair, unfortunately I didn't see your message earlier. It appears that my data flow gets increasingly clogged when I cross 30-40 streams during US time trading session. With earlier or later times it is usually fine and I can work to saturation. I am trying to track down the problem, which it might be related to java

softwarespartan commented 3 years ago

Be sure to increase your JVM heap size and memory limits at Matlab launch

Sent from my iPhone

On Sep 30, 2021, at 10:57 PM, giovannetti87 @.***> wrote:

 @Despair2000 Hi Despair, unfortunately I didn't see your message earlier. It appears that my data flow gets increasingly clogged when I cross 30-40 streams during US time trading session. With earlier or later times it is usually fine and I can work to saturation. I am trying to track down the problem, which it might be related to java

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

giovannetti87 commented 3 years ago

Hi @softwarespartan , sounds great, I'll try right now. Thanks!

giovannetti87 commented 3 years ago

Hi @softwarespartan Abel, unfortunately it seems the clogging is still happening. When I try to terminate matlab with ctrl+c, this is what matlab prints. Sometimes I am able to get the command line back to normal (after long waiting) most of times, not.

Thank you!

Operation terminated by user during TWS.processNotification (line 7) com.mathworks.jmi.MatlabException: InterruptException(ctrl-c) at com.mathworks.jmi.NativeMatlab.SendMatlabMessage(Native Method) at com.mathworks.jmi.NativeMatlab.sendMatlabMessage(NativeMatlab.java:273) at com.mathworks.jmi.MatlabLooper.sendMatlabMessage(MatlabLooper.java:120) at com.mathworks.jmi.Matlab.mtFevalConsoleOutput(Matlab.java:1835) at com.proxy.ProxyBuilder$Handler$2.run(ProxyBuilder.java:125)

Despair2000 commented 3 years ago

For me it can also take some time for Matlab to return to the command line after I issue CTRL-C but we are talking about a few seconds. What kind of machine are you on?

giovannetti87 commented 3 years ago

Hi @Despair2000, I'm using an aero 15 with 16gb ram and i7, usually it doesn't trouble me too much?

giovannetti87 commented 3 years ago

A screenshot of what I see. You can see a bunch of errors every time I try to terminate execution. I have to press CTRL+C several times before the script accepts the halt, and nonetheless the command window is never returned.

image

Despair2000 commented 3 years ago

I also get several of these errors before the scripts exits and it happened to me also that a script did not return to the command window but every time this happened it was usually something with my code. Usually I hit several times CTRL-C to break a script then I wait a moment for the command line to return. If it doesn't I restart matlab. You don't have any fifo-buffers in a cell array? I ask because this caused really weird errors for me.

giovannetti87 commented 3 years ago

Glad to hear that to a lesser extent, you're experiencing the same phenomenon. In my case, matlab is still workable if I keep the flow to 20-40 contemporaneous streams. Anything north this, forces a reboot. I do have several fifo-buffers in struct. variables, but I don't think I have any in cell arrays.

softwarespartan commented 3 years ago

@giovannetti87 can you pull the latest IB4m update from GitHub. There was a bug in the previous JAR file that has been patched in the recent update. This update will resolve the issue with proxy builder

softwarespartan commented 3 years ago

Hi @softwarespartan Abel, unfortunately it seems the clogging is still happening. When I try to terminate matlab with ctrl+c, this is what matlab prints. Sometimes I am able to get the command line back to normal (after long waiting) most of times, not.

Thank you!

Operation terminated by user during TWS.processNotification (line 7) com.mathworks.jmi.MatlabException: InterruptException(ctrl-c) at com.mathworks.jmi.NativeMatlab.SendMatlabMessage(Native Method) at com.mathworks.jmi.NativeMatlab.sendMatlabMessage(NativeMatlab.java:273) at com.mathworks.jmi.MatlabLooper.sendMatlabMessage(MatlabLooper.java:120) at com.mathworks.jmi.Matlab.mtFevalConsoleOutput(Matlab.java:1835) at com.proxy.ProxyBuilder$Handler$2.run(ProxyBuilder.java:125)

This is separate issue from memory. If pull the latest updates from the repo should resolve this issue. Regarding memory, make your java heap size pretty large. If streaming market data that's pretty memory intensive. The circular buffers are good in this use case since they help automatically manage memory. If you're storing market data in memory it can grow very fast so be careful to manage system memory accordingly.

giovannetti87 commented 3 years ago

Hi Abel, confirming I pulled the last version from this repository, but with no apparent change. I am using TWS973.jar . In terms of java heap size, I set to 12GB. However, when I move to 60+ streams I end up in a situation similar to the screen above. Sometimes, I also have the following error coming out when I load TWS973.jar:

log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: \Users\abelbrown\Dropbox\finance\matlab\IB4m\logs\log.txt (The system cannot find the path specified) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:133) at org.apache.log4j.FileAppender.setFile(FileAppender.java:290) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:164) at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:216) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:257) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:133) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:97) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:689) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:334) at org.apache.log4j.PropertyConfigurator.configure(PropertyConfigurator.java:342) log4j:ERROR Either File or DatePattern options are not set for appender [FILE].

thanks!

softwarespartan commented 3 years ago

OK, this is different from the proxy builder bug or the memory issue :)

Here you want to set the logdir in the file 'IB4m/log4j.conf' to where you want the log files to go

Often can just make a directory in your IB4m folder called logs and set the logdir to './logs'

If you call which('log4j.conf') on the MATLAB command window, you'll be able to see where you log4j config file is. It should be located in the IB4m folder.

BTW, this error is not fatal, it's just saying it couldn't initialize the logger.

Regarding the ctrl-c discussion, I guess i'm not sure where "ctrl-c" is being used. With event based listeners, there should be no loop or continuously executing code so it's not clear to me what ctrl-c is interrupting.

Here is my guess, you are streaming a lot of market data. All that market data gets stuck in memory. Memory gets full and starts to thrash. Also there is very high rate of data for market streams. If your call back is not highly optimized it is very easy for the system to get backed up waiting on previous event processings to finish before issuing callback for the next event. You should do a little bit of back of the envelop type calculations to understand how many events per second you're trying to stream and multiply that by the callback latency.

Again, i'll stress that circular buffers prevent a lot of this. They allow the systems and algorithm to simply retrieve the latest N events. You're often much better off simply subscribing to the 1 second feed or the 5 second feed. If you really want to process market data "for real" then need to be in C++.

softwarespartan commented 3 years ago

Hi @softwarespartan Abel, unfortunately it seems the clogging is still happening. When I try to terminate matlab with ctrl+c, this is what matlab prints. Sometimes I am able to get the command line back to normal (after long waiting) most of times, not.

Thank you!

Operation terminated by user during TWS.processNotification (line 7) com.mathworks.jmi.MatlabException: InterruptException(ctrl-c) at com.mathworks.jmi.NativeMatlab.SendMatlabMessage(Native Method) at com.mathworks.jmi.NativeMatlab.sendMatlabMessage(NativeMatlab.java:273) at com.mathworks.jmi.MatlabLooper.sendMatlabMessage(MatlabLooper.java:120) at com.mathworks.jmi.Matlab.mtFevalConsoleOutput(Matlab.java:1835) at com.proxy.ProxyBuilder$Handler$2.run(ProxyBuilder.java:125)

wanted to make one more comment here. This message is actually correct behavior. The message is saying that 'ctrl-c' interrupted 'TWS.processNotification' and that the proxy builder call to the MATLAB JMI (which invokes TWS.processNotification) was clobbered.

Something to keep in mind is that "errors" are not the same as "crash". Processing high volume live streaming data will always be tricky. There will always be unexpected behavior due to the very complex interaction of market servers, which talk to broker servers, which service client connections, which service API calls, which are bridged by libraries like IB4m, which interfaces into MATLAB runtime, etc etc. Things will error here and there. However, that does not mean the system has crashed and become somehow "unstable".

giovannetti87 commented 3 years ago

OK, this is different from the proxy builder bug or the memory issue :)

Here you want to set the logdir in the file 'IB4m/log4j.conf' to where you want the log files to go

Often can just make a directory in your IB4m folder called logs and set the logdir to './logs'

If you call which('log4j.conf') on the MATLAB command window, you'll be able to see where you log4j config file is. It should be located in the IB4m folder.

BTW, this error is not fatal, it's just saying it couldn't initialize the logger.

Confirming, all fixed and nicely working. (And yes, I also agree that someone endowed with such questions should refrain himself not only from using a trading API and wasting others' valuable time, but from trading in general, probably)

giovannetti87 commented 3 years ago

Hi @softwarespartan Abel, unfortunately it seems the clogging is still happening. When I try to terminate matlab with ctrl+c, this is what matlab prints. Sometimes I am able to get the command line back to normal (after long waiting) most of times, not. Thank you! Operation terminated by user during TWS.processNotification (line 7) com.mathworks.jmi.MatlabException: InterruptException(ctrl-c) at com.mathworks.jmi.NativeMatlab.SendMatlabMessage(Native Method) at com.mathworks.jmi.NativeMatlab.sendMatlabMessage(NativeMatlab.java:273) at com.mathworks.jmi.MatlabLooper.sendMatlabMessage(MatlabLooper.java:120) at com.mathworks.jmi.Matlab.mtFevalConsoleOutput(Matlab.java:1835) at com.proxy.ProxyBuilder$Handler$2.run(ProxyBuilder.java:125)

Regarding the ctrl-c discussion, I guess i'm not sure where "ctrl-c" is being used. With event based listeners, there should be no loop or continuously executing code so it's not clear to me what ctrl-c is interrupting.

Here is my guess, you are streaming a lot of market data. All that market data gets stuck in memory. Memory gets full and starts to thrash. Also there is very high rate of data for market streams. If your call back is not highly optimized it is very easy for the system to get backed up waiting on previous event processings to finish before issuing callback for the next event. You should do a little bit of back of the envelop type calculations to understand how many events per second you're trying to stream and multiply that by the callback latency.

Again, i'll stress that circular buffers prevent a lot of this. They allow the systems and algorithm to simply retrieve the latest N events. You're often much better off simply subscribing to the 1 second feed or the 5 second feed. If you really want to process market data "for real" then need to be in C++. ....... wanted to make one more comment here. This message is actually correct behavior. The message is saying that 'ctrl-c' interrupted 'TWS.processNotification' and that the proxy builder call to the MATLAB JMI (which invokes TWS.processNotification) was clobbered.

Something to keep in mind is that "errors" are not the same as "crash". Processing high volume live streaming data will always be tricky. There will always be unexpected behavior due to the very complex interaction of market servers, which talk to broker servers, which service client connections, which service API calls, which are bridged by libraries like IB4m, which interfaces into MATLAB runtime, etc etc. Things will error here and there. However, that does not mean the system has crashed and become somehow "unstable".

That makes absolutely sense to me. I can clearly see that the latent object engulfing matlab has to be related to a quantity per second, since if I dilute the object in either dimension (either by pulling data less frequently or asking less streams) I get Matlab back under control. I can also confirm that performing within script operations such as computing a johansen test or estimating a VAR while data is pulled may dramatically worsen the chances to keep matlab under control.

I see that I can pull about 40 streams every 0.30 seconds, more or less. I was wondering whether there is a way to flush out the unprocessed stream data (pardon my barbarian language and simplistic mindframe), such as killing and restoring the connections every now and then?

Thanks so much Abel, your API is likely the best thing I had under my eyes this entire year