matthewgilbert / pdblp

pandas wrapper for Bloomberg Open API
MIT License
240 stars 69 forks source link

bdib is returning a spurious value #58

Closed gflores87 closed 5 years ago

gflores87 commented 5 years ago

Code Sample, a copy-pastable example if possible

# Your code here, this should be a minimal reproducible example, see 
import pdblp
import pandas
import datetime

con = pdblp.BCon(debug=False, port=8194, timeout=10000)
con.start()
df = con.bdib('IBOXUMAE MKIT Curncy', '2018-12-27T01:00:00', datetime.datetime.today() + datetime.timedelta(hours=15), 'TRADE',  1)[['close']]

This code returns as the last ticker a value that makes no sense. Current market is trading at 67.55 as of 15:43 pm GMT but the query returns as the most recent tick a value from the future at 19.22 GMT and a spread value of 62.

Expected Output

Expected output should include as the most recent tick a timestamp that is less than current time at GMT, not from the future

Version Information

0.1.8

matthewgilbert commented 5 years ago

I am not sure what the issue is but in bdib the end_datetime is a str but you are passing a datetime in your example. In addition, by posting an example with datetime.datetime.today() this becomes challenging to reproduce since it is dependent on the specific time a user has run this code. In addition you have not defined freq in your snippet

It would be helpful for you to display the minimal amount of information required to reproduce the issue, i.e. for intraday data only the most recent few bars instead of additional data back to 2018-12-27. In addition actually showing the small amount of data in df that looks spurious to you. Also setting debug=True can be helpful to get some idea about what the underlying blpapi messages look like, to ascertain whether this looks like an issue with blpapi or pdblp

gflores87 commented 5 years ago

Ok, so I incorporated all the remarks and ran the code. I know it looks awful but this is the output for the last day. Please look at the very last tick, the 62.912 makes no sense

C:\Users\GFloresGonzalez\AppData\Local\Continuum\anaconda3\pythonw.exe C:/Users/GFloresGonzalez/.PyCharm2018.3/config/scratches/scratch.py
pdblp.pdblp:INFO:Event Type: 'SESSION_STATUS'
pdblp.pdblp:INFO:Message Received:
SessionConnectionUp = {
    server = "localhost:8194"
    encryptionStatus = "Clear"
}

pdblp.pdblp:INFO:Event Type: 'SESSION_STATUS'
pdblp.pdblp:INFO:Message Received:
SessionStarted = {
    initialEndpoints[] = {
        initialEndpoints = {
            address = "localhost:8194"
        }
    }
}

pdblp.pdblp:INFO:Event Type: 'SERVICE_STATUS'
pdblp.pdblp:INFO:Message Received:
ServiceOpened = {
    serviceName = "//blp/refdata"
}

pdblp.pdblp:INFO:Event Type: 'SERVICE_STATUS'
pdblp.pdblp:INFO:Message Received:
ServiceOpened = {
    serviceName = "//blp/exrsvc"
}

pdblp.pdblp:INFO:Sending Request:
IntradayBarRequest = {
    security = "IBOXUMAE MKIT Curncy"
    eventType = TRADE
    interval = 1
    startDateTime = 2019-02-10T01:00:00
    endDateTime = 2019-02-11T20:00:00
}

pdblp.pdblp:INFO:Event Type: 'RESPONSE'
pdblp.pdblp:INFO:Message Received:
IntradayBarResponse = {
    barData = {
        eidData[] = {
        }
        barTickData[] = {
            }
            barTickData = {
                time = 2019-02-11T19:11:00.000
                open = 67.693000
                high = 67.693000
                low = 67.686000
                close = 67.686000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:12:00.000
                open = 67.674000
                high = 67.674000
                low = 67.661000
                close = 67.661000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:14:00.000
                open = 67.655000
                high = 67.655000
                low = 67.609000
                close = 67.609000
                volume = 0
                numEvents = 9
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:16:00.000
                open = 67.611000
                high = 67.611000
                low = 67.608000
                close = 67.608000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:20:00.000
                open = 67.618000
                high = 67.628000
                low = 67.618000
                close = 67.628000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:22:00.000
                open = 62.912000
                high = 62.912000
                low = 62.912000
                close = 62.912000
                volume = 0
                numEvents = 2
                value = 0.000000
            }
        }
    }
}

time                            close        
2019-02-11 14:11:00  67.686
2019-02-11 14:12:00  67.661
2019-02-11 14:14:00  67.609
2019-02-11 14:16:00  67.608
2019-02-11 14:20:00  67.628
2019-02-11 14:22:00  62.912

2019-02-11T20:00:00

Process finished with exit code 0
matthewgilbert commented 5 years ago

Okay this seems like a lot of data, again is it not possible to give minimal data and point to the exact piece of data you think is problematic? For example, if you have 2 minute bars and you think the data is bad at 2019-02-11T14:22:00 than maybe give a 10 minute interval around that time which would lead to 5 data points instead of 228 rows.

gflores87 commented 5 years ago

Interesting thing here is: the output is in GMT, and the trouble tick was at 19:22 (it was included even though you have not observed that time). If I run the code now, the bad timestamp is now inside the observed ticks and I can't replicate the issue. I will remove the unnecessary output from my prev post

Update: the bad stick is still present!

Code used to generate outputs:

import pdblp
import datetime

con = pdblp.BCon(debug=True)
con.start()

start_date = '2019-02-11T19:20:00'
end_date =  '2019-02-11T19:40:00'

df = con.bdib('IBOXUMAE MKIT Curncy', start_date, end_date, 'TRADE', 1)[['close']]

print(df)
print(end_date)

Output

C:\Users\GFloresGonzalez\AppData\Local\Continuum\anaconda3\pythonw.exe C:/Users/GFloresGonzalez/.PyCharm2018.3/config/scratches/scratch.py
pdblp.pdblp:INFO:Event Type: 'SESSION_STATUS'
pdblp.pdblp:INFO:Message Received:
SessionConnectionUp = {
    server = "localhost:8194"
    encryptionStatus = "Clear"
}

pdblp.pdblp:INFO:Event Type: 'SESSION_STATUS'
pdblp.pdblp:INFO:Message Received:
SessionStarted = {
    initialEndpoints[] = {
        initialEndpoints = {
            address = "localhost:8194"
        }
    }
}

pdblp.pdblp:INFO:Event Type: 'SERVICE_STATUS'
pdblp.pdblp:INFO:Message Received:
ServiceOpened = {
    serviceName = "//blp/refdata"
}

pdblp.pdblp:INFO:Event Type: 'SERVICE_STATUS'
pdblp.pdblp:INFO:Message Received:
ServiceOpened = {
    serviceName = "//blp/exrsvc"
}

pdblp.pdblp:INFO:Sending Request:
IntradayBarRequest = {
    security = "IBOXUMAE MKIT Curncy"
    eventType = TRADE
    interval = 1
    startDateTime = 2019-02-11T19:20:00
    endDateTime = 2019-02-11T19:40:00
}

pdblp.pdblp:INFO:Event Type: 'RESPONSE'
pdblp.pdblp:INFO:Message Received:
IntradayBarResponse = {
    barData = {
        eidData[] = {
        }
        barTickData[] = {
            barTickData = {
                time = 2019-02-11T19:20:00.000
                open = 67.618000
                high = 67.640000
                low = 67.618000
                close = 67.640000
                volume = 0
                numEvents = 6
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:21:00.000
                open = 67.646000
                high = 67.652000
                low = 67.646000
                close = 67.652000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:22:00.000
                open = 67.657000
                high = 67.662000
                low = 62.912000
                close = 62.912000
                volume = 0
                numEvents = 5
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:24:00.000
                open = 67.672000
                high = 67.682000
                low = 67.672000
                close = 67.682000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:26:00.000
                open = 67.687000
                high = 67.812000
                low = 67.687000
                close = 67.812000
                volume = 0
                numEvents = 12
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:28:00.000
                open = 67.822000
                high = 67.832000
                low = 67.822000
                close = 67.832000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:30:00.000
                open = 67.827000
                high = 67.836000
                low = 67.822000
                close = 67.831000
                volume = 0
                numEvents = 6
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:32:00.000
                open = 67.853000
                high = 67.874000
                low = 67.853000
                close = 67.874000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:33:00.000
                open = 67.873000
                high = 67.873000
                low = 67.872000
                close = 67.872000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:38:00.000
                open = 67.877000
                high = 67.896000
                low = 67.877000
                close = 67.896000
                volume = 0
                numEvents = 6
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:39:00.000
                open = 67.903000
                high = 67.910000
                low = 67.903000
                close = 67.910000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
        }
    }
}

                      close
time                       
2019-02-11 14:20:00  67.640
2019-02-11 14:21:00  67.652
2019-02-11 14:22:00  62.912
2019-02-11 14:24:00  67.682
2019-02-11 14:26:00  67.812
2019-02-11 14:28:00  67.832
2019-02-11 14:30:00  67.831
2019-02-11 14:32:00  67.874
2019-02-11 14:33:00  67.872
2019-02-11 14:38:00  67.896
2019-02-11 14:39:00  67.910
2019-02-11T19:40:00

Process finished with exit code 0

matthewgilbert commented 5 years ago

It would be good to also edit your initial post to show the code that this (and make it reproducible by using a fixed string date time vs present moment)

gflores87 commented 5 years ago

It would be good to also edit your initial post to show the code that this (and make it reproducible by using a fixed string date time vs present moment)

I added the code that produces always the same results regardless of the current time. The third tick is the spurious one.

Also, I talked to BBG and they couldn't replicate the issue using standard Request/Response, so it looks like the issue is with pdblp itself

matthewgilbert commented 5 years ago

I'm not sure I follow. The response from Bloomberg is

        barTickData[] = {
            barTickData = {
                time = 2019-02-11T19:20:00.000
                open = 67.618000
                high = 67.640000
                low = 67.618000
                close = 67.640000
                volume = 0
                numEvents = 6
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:21:00.000
                open = 67.646000
                high = 67.652000
                low = 67.646000
                close = 67.652000
                volume = 0
                numEvents = 3
                value = 0.000000
            }
            barTickData = {
                time = 2019-02-11T19:22:00.000
                open = 67.657000
                high = 67.662000
                low = 62.912000
                close = 62.912000
                volume = 0
                numEvents = 5
                value = 0.000000
            }
...

The third tick here is close = 62.912000. This corresponds to what is in the dataframe. What is the problem with this?

In addition, when I say minimal reproducible example here is what I mean

import pdblp

con = pdblp.BCon(debug=True)
con.start()

start_date = '2019-02-11T19:20:00'
end_date =  '2019-02-11T19:40:00'

df = con.bdib('IBOXUMAE MKIT Curncy', start_date, end_date,  'TRADE', 1)
print(df)

Changing the index data as done above is irrelevant and should not be included, it just complicates the matter. In addition using datetime.date.today() makes this example dependent on the day it was run.

gflores87 commented 5 years ago

ok, Agree on the unnecessary part for the index. And I will fix the call to make it not depend on today. However, all of that is circumstantial to the point in hand. As you said, the close for the 14:22 EST time stamp matches what's in the Dataframe. But look at the open, it's 67 and it's impossible for it to move 5 bps in 1 minute. Also, let me try to give more detail. I called this code in the morning, when it was 13.00 GMT. If I did this:

start_date = '2019-02-11T01:00:00'
end_date =  '2019-02-11T19:40:00'

df = con.bdib('IBOXUMAE MKIT Curncy', start_date, end_date,  'TRADE', 1)

Then I would have received as the last entry:

barTickData = {
                time = 2019-02-11T19:22:00.000
                open = 62.912000
                high = 62.912000
                low = 62.912000
                close = 62.912000
                volume = 0
                numEvents = 2
                value = 0.000000

So it would have 'looked 6 hours into the future'. I kept observing this behavior throughout the day: if I asked for an end_date > 2019-02-11T19:22:00 I would recover this spurious 62.91 tick. Even now that the 19:22 GMT is part of the observed timestamps, it is still showing the 62.91 as the close, but it is clear it did not trade at that level whatsoever.

matthewgilbert commented 5 years ago

It seems like there is some confusion. The underlying messages that are logged to output when debug=True are blpapi messages (the Bloomberg library that I wrap)

I.e. requests of the form

IntradayBarRequest = {
    security = "IBOXUMAE MKIT Curncy"
    eventType = TRADE
    interval = 1
    startDateTime = 2019-02-10T01:00:00
    endDateTime = 2019-02-11T20:00:00
}

and responses of the form

        barTickData[] = {
            barTickData = {
                time = 2019-02-11T19:20:00.000
                open = 67.618000
                high = 67.640000
                low = 67.618000
                close = 67.640000
                volume = 0
                numEvents = 6
                value = 0.000000
            }
...

are coming from the Bloomberg library blpapi. If this data looks bad to you contact Bloomberg for support with blpapi. In particular how they handle timezone stuff is often dependent on your terminal configuration, but they would be better able to assist you with that.

gflores87 commented 5 years ago

I went with them first and they were super convinced the issue was the wrapper. They gave a sample code and it 'proved' they were right. However, after your insistence I checked their code again and the 62.91 is actually there! So everything makes sense now and there's no issue with the wrapper but the data response itself. I was wrecking my brain trying to convince myself of their explanation but I'll go back to them.

Sorry for the troubles and thank you so much for your time