joeldg / bowhead

PHP trading bot framework
Apache License 2.0
799 stars 289 forks source link

30m OHLC data not actually 30m #22

Open cybersolutionsllc opened 7 years ago

cybersolutionsllc commented 7 years ago

Not sure if this is an actual problem or not (since not sure how the data is processed exactly when running strategies) but I've noticed that the 30m table data is not recorded close to 30m increments it seems a bit off ..

http://storage7.static.itmages.com/i/17/0807/h_1502079128_1323211_07ba2568fa.png

You can see that the times are a bit off .. 01:25:00 01:55:03 02:00:02 02:15:03

So seems to be coming in sometimes 5 minute increments, 15 minute increments, it kind of seems all over the place. Shouldn't the 30m table only contain entries roughly every 30 minutes?

rxmg-joeldg commented 7 years ago

Oh.. That might be an issue, I'm doing some updates to the data collection this week.

If you find anything specific let me know.

On Sun, Aug 6, 2017 at 9:14 PM, Brian Wade - Magento Developer < notifications@github.com> wrote:

Not sure if this is an actual problem or not (since not sure how the data is processed exactly when running strategies) but I've noticed that the 30m table data is not recorded close to 30m increments it seems a bit off ..

http://storage7.static.itmages.com/i/17/0807/h_ 1502079128_1323211_07ba2568fa.png

You can see that the times are a bit off .. 01:25:00 01:55:03 02:00:02 02:15:03

So seems to be coming in sometimes 5 minute increments, 15 minute increments, it kind of seems all over the place. Shouldn't the 30m table only contain entries roughly every 30 minutes?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joeldg/bowhead/issues/22, or mute the thread https://github.com/notifications/unsubscribe-auth/AOT1lZAtToVsQmCPEb4BgRURLBSimV-Rks5sVo8agaJpZM4Ou-fn .

-- Joel De Gan Vice President of Technology


Mobile: (646) 724-0451 Email: joeldg@rxmg.com Skype: joel-dg

2320 Abbot Kinney Blvd, Suite A Venice, CA 90291 www.RxMG.com http://www.rxmg.com/

cybersolutionsllc commented 7 years ago

I was able to get to this sooner ;)

https://github.com/joeldg/bowhead/pull/23/files

cybersolutionsllc commented 7 years ago

I'd recommend to store tick data just in case as well, may come in useful for some strategies or later on ..

So would need a new table bowhead_ohlc_tick and then update on every data received in OHLC , I'll try to get to that shortly.

lploumen commented 7 years ago

Hi, How is your PR supposed to work ? If I understand correctly, it just updates the table every 5 min (in my case, I use 5m period) with the last data. That means ohlc are the last received values. If I use your file, and query the database I get :

image

I think if we want to continue using bitfinex, we should use the REST API which allows requests like https://api.bitfinex.com/v2/candles/trade:5m:tETHUSD/hist instead of the web socket API.

cybersolutionsllc commented 7 years ago

You make a good point here. I only modified the file to wait to update the database until after the period for that table, I did not make any changes to the original method for data storage.

But I think that you are correct in that a true 5m period would not just save the last data but would include the open, high, low, close, and volume throughout that 5m period. So in reality the data should either be calculated from the already stored tick data or it should come from a source which already provides the 5m data accumulated like you have provided.

IMO it may be easier to store tick data in another database table, and then calculate the different time periods off existing tick data, than to rewrite to accept different time periods from the websocket connector. We have to also take into account that the OHLC database functions also works with other connectors and data.

@rxmg-joeldg if you have no objections I'd like to change this as per lploumen's comments in the following way:

  1. Add a database table for tick data , meaning any/all data that comes across from the Bitfinex websocket
  2. For the time period table data, than the period data will be calculated "on the fly" using the tick data and MySQL query to only grab for the preceding time period and then use correct open, correct close, highest high, and lowest low, correct total volume for the period)

Let me know any comments / additional suggestions

rxmg-joeldg commented 7 years ago

Go for it man! I am pretty swamped with work at the moment so am trying to squeeze out time to do anything with this. That won't be happening for a while.

-Joel

On Tue, Aug 8, 2017 at 4:33 AM, Brian Wade - Magento Developer < notifications@github.com> wrote:

You make a good point here. I only modified the file to wait to update the database until after the period for that table, I did not make any changes to the original method for data storage.

But I think that you are correct in that a true 5m period would not just save the last data but would include the open, high, low, close, and volume throughout that 5m period. So in reality the data should either be calculated from the already stored tick data or it should come from a source which already provides the 5m data accumulated like you have provided.

IMO it may be easier to store tick data in another database table, and then calculate the different time periods off existing tick data, than to rewrite to accept different time periods from the websocket connector. We have to also take into account that the OHLC database functions also works with other connectors and data.

@rxmg-joeldg https://github.com/rxmg-joeldg if you have no objections I'd like to change this as per lploumen's comments in the following way:

  1. Add a database table for tick data , meaning any/all data that comes across from the Bitfinex websocket
  2. For the time period table data, than the period data will be calculated "on the fly" using the tick data and MySQL query to only grab for the preceding time period and then use correct open, correct close, highest high, and lowest low, correct total volume for the period)

Let me know any comments / additional suggestions

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joeldg/bowhead/issues/22#issuecomment-320929175, or mute the thread https://github.com/notifications/unsubscribe-auth/AOT1lQRLsyUu6QhqKKZ3A55hUPYy5YKKks5sWEeKgaJpZM4Ou-fn .

-- Joel De Gan Vice President of Technology


Mobile: (646) 724-0451 Email: joeldg@rxmg.com Skype: joel-dg

2320 Abbot Kinney Blvd, Suite A Venice, CA 90291 www.RxMG.com http://www.rxmg.com/

cybersolutionsllc commented 7 years ago

@rxmg-joeldg Actually already done :) Took a bit of elbow grease but I think I got it taken care of and updating correctly for other time periods. Might want to clean up or re-format a bit later but it's working well and providing the data correctly IMO

@lploumen Pull request is here https://github.com/joeldg/bowhead/pull/25 if you want to also test it out and offer feedback we can all discuss. Be aware Joel has not approved pull request as of yet either so definitely test on a test server of without live account connected first to confirm everything is working for you.

You will need to import app/Scripts/DBdump-0.1.sql to create the tick table for storing tick data from the WebSocket

cybersolutionsllc commented 7 years ago

Additional comment - I did not do anything with the volume information. I'd imagine that needs to be addressed properly since some strategies will use volume. I can address it, I just need to know how what time period the volume is presented in .. for example as it comes across in the websocket (tick data) it is being added as is to the database.

For example, for 5m or other increments it is still just being added from that particular last volume tick because I wasn't sure how to present correctly. Do I just add them together? For example for the 5m do I just add the immediately preceding 5 1m volume entries together and that's correct?

I didn't want to change that without knowing the proper way to do so.

lploumen commented 7 years ago

Hi, Now do we need those tables bowheadohlc* since we can compute the requested period in the getRecentData method from your new table ? Also, would it be possible that the period start at 'clock' time : The 30 min period starts 0 minutes, the 5 min period are all 5 minutes (00:05, 00:10,...). Now the 5 min period table looks like this :

image Which is maybe not a problem but as I'm trying to understand some trading concept, I like to compare the data to the charts where candles always starts at the same minute.

cybersolutionsllc commented 7 years ago

We don't necessarily need the tables I don't think. However and IMO it's better to have done the calculation once and added to the database, since the data might be used for other purposes later.

I get what you are saying about the "clock" time, but part of the problem is that the data comes from the websocket in unknown intervals and without time stamp. So the process of adding the data to the database occurs as soon as data is received. For example in the tick data we might receive information at 3 seconds, 7 seconds, 15 seconds, 20 seconds, 24 seconds ..etc it's always different. The process to save the data is triggered at that point.

So because of that is why I have the database time stamp checked at that time to see if it is more than the period time i.e. 1m or 5m and if it has been 1m or 5m than it records the data.

So I don't have any immediate way to correct this problem. However there is a way you can get it to be better sync'd on your end. I'd recommend that you stop the bitfinnex connector and then modify the last entry in the tables so that time is X:00:00 (in each table you want exact time). Then start the bitfinnex connector at exactly 00:00 or 30:00 i.e. 2:00PM or 2:30PM. It will always measure the time passed from the last entry time and performing these actions as mentioned should get the data entries to be closer to "clock" time.

connors511 commented 7 years ago

I'm using the following for calculating the $timeidb, which bases the ids on the "clock" time; i.e. 5m gets 00, 05, 10, 15, ... minutes and 15m gets 00, 15, 30, 45

$now = \Carbon\Carbon::now();
$timeid = (int)$now->format('YmdHi');

$minutes = ['1m', '5m', '15m', '30m'];
foreach ($minutes as $minute) {
    $timeidb = $timeid - ($now->minute % ((int)$minute));
    \DB::insert('...');
}

As for volume, I believe it's the daily volume, so you'd have to calculate it throughout the day. Might be easier to implement the v2 (beta) version of bitfinex's websocket API for candles; https://docs.bitfinex.com/v2/reference#ws-public-candle

cybersolutionsllc commented 7 years ago

@connors511

I'm using the following for calculating the $timeidb, which bases the ids on the "clock" time; i.e. 5m gets 00, 05, 10, 15, ... minutes and 15m gets 00, 15, 30, 45

Thanks for the input however once the new pull request is approved and merged $timeidb is not a variable that is present anymore. Additionally the modifications you propose only modify the timeid which doesn't really insert accurate data for that time (it just "fakes" the time to look ok). The bigger problem is as mentioned previously in that we can't control when function markOHLC is called because it is triggered by the external Bitfinnex connector script , at inconsistent timings (when the web socket gets data).

Take a look at https://github.com/joeldg/bowhead/pull/25/files to see what may be coming down the pipeline soon.

Your mention of code however did help as I think that we may be able to "sleep" or skip creating some entries until the appropriate minute mark occurs. For example right now it creates the first 5m , 15m, 30m entry as soon as the script runs. Subsequent entries are then based on being 5m, 15m ..etc past the initial entry. So essentially if we can hold off the initial entry until it gets to a specific minute (05 for 5m, 15 for 15m, ..etc) than we'd be in good shape. The only issue then is that the initial build up of data will take a little longer because for example 1hr data may need to wait until 00 to insert it's first data. (i.e. if the bittrex script was started at 2 minutes after, it would need to wait 58 minutes to insert the first recent 1h entry.

As for volume, I believe it's the daily volume, so you'd have to calculate it throughout the day. Might be easier to implement the v2 (beta) version of bitfinex's websocket API for candles; https://docs.bitfinex.com/v2/reference#ws-public-candle

That's what I thought as well (24 hour volume). I'm not aware of any strategies that may use it or how so possibly that's important to understand as to then how the volume data should be stored or presented. I'll likely leave that for someone who has a better idea of what to do with volume.

Noted on the suggestion of a different bitfinex API, at the moment though in terms of exchange integrations I am focusing on a Coinigy integration, since that will open Bowhead up to an insane amount of exchanges and markets all using the same API connection.

rxmg-joeldg commented 7 years ago

I approved the PR

On Tue, Aug 8, 2017 at 3:04 PM, Brian Wade - Magento Developer < notifications@github.com> wrote:

I'm using the following for calculating the $timeidb, which bases the ids on the "clock" time; i.e. 5m gets 00, 05, 10, 15, ... minutes and 15m gets 00, 15, 30, 45

Thanks for the input however once the new pull request is approved and merged $timeid is not a variable that is present anymore. Additionally the modifications you propose only modify the timeid which doesn't really insert accurate data for that time (it just "fakes" the time to look ok). The bigger problem is as mentioned previously in that we can't control when function markOHLC is called because it is triggered by the external Bitfinnex connector script , at inconsistent timings (when the web socket gets data).

Take a look at https://github.com/joeldg/bowhead/pull/25/files to see what may be coming down the pipeline soon.

Your mention of code however did help as I think that we may be able to "sleep" or skip creating some entries until the appropriate minute mark occurs. For example right now it creates the first 5m , 15m, 30m entry as soon as the script runs. Subsequent entries are then based on being 5m, 15m ..etc past the initial entry. So essentially if we can hold off the initial entry until it gets to a specific minute (05 for 5m, 15 for 15m, ..etc) than we'd be in good shape. The only issue then is that the initial build up of data will take a little longer because for example 1hr data may need to wait until 00 to insert it's first data. (i.e. if the bittrex script was started at 2 minutes after, it would need to wait 58 minutes to insert the first recent 1h entry.

As for volume, I believe it's the daily volume, so you'd have to calculate it throughout the day. Might be easier to implement the v2 (beta) version of bitfinex's websocket API for candles; https://docs.bitfinex.com/v2/ reference#ws-public-candle

That's what I thought as well (24 hour volume). I'm not aware of any strategies that may use it or how so possibly that's important to understand as to then how the volume data should be stored or presented. I'll likely leave that for someone who has a better idea of what to do with volume.

Noted on the suggestion of a different bitfinex API, at the moment though in terms of exchange integrations I am focusing on a Coinigy integration, since that will open Bowhead up to an insane amount of exchanges and markets all using the same API connection.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joeldg/bowhead/issues/22#issuecomment-321094976, or mute the thread https://github.com/notifications/unsubscribe-auth/AOT1lcNjnzQfZcnVGBI56UQyCHu4XIAMks5sWNtagaJpZM4Ou-fn .

-- Joel De Gan Vice President of Technology


Mobile: (646) 724-0451 Email: joeldg@rxmg.com Skype: joel-dg

2320 Abbot Kinney Blvd, Suite A Venice, CA 90291 www.RxMG.com http://www.rxmg.com/

lploumen commented 7 years ago

In my cases , I ended up creating a dotnet core app (I'm not so familiar with PHP) which takes data from https://api.kraken.com/0/public/OHLC?pair=ethEUR&interval=5 and fill the 5m database. This way I exactly get the same data used to generate charts on the Kraken website and case easily check bowhead results.