nautechsystems / nautilus_trader

A high-performance algorithmic trading platform and event-driven backtester
https://nautilustrader.io
GNU Lesser General Public License v3.0
1.95k stars 445 forks source link

Add Support for Databento Symbology in InteractiveBrokersInstrumentProvider #1790

Open rsmb7z opened 1 month ago

rsmb7z commented 1 month ago

Feature Request

Refactor InteractiveBrokersInstrumentProvider to accept Databento symbology as an option, while keeping the original Interactive Brokers symbology intact. This will enhance flexibility in symbol management.

Requirements

Optional Databento Symbology flag:

rterbush commented 1 month ago

I've had this discussion quite a bit with @rsmb7z ...

Is it possible for us to standardize on the exchange symbology and map to the various data and execution providers symbology under the covers?

For example, I'd much prefer to standardize on CME symbology for working with futures contracts, mapping to whatever symbol needed for IB to execute trades, or whatever symbol Databento needs to pull data. Seems that would stay true to the goal of same code running in backtest or live trading.

Or am I misunderstanding this issue?

rsmb7z commented 1 month ago

Hi @rterbush

The current plan is to ensure that the translation happens seamlessly under the hood, with the IB adapter respecting the Databento symbology. This means the symbology used for the historical dataset provided by Databento will be utilized during backtesting and other processes. Once the use case is implemented, there will be room for further refinement and consolidation as needed.

cjdsellers commented 1 month ago

Some additional background: I had originally implemented the Databento client to use the individual CME venues instead of the umbrella GLBX venue which Databento are using.

IIRC this resulted in a sharp increase in complexity, any subscription would first require instrument definitions to be available or requested so we could get at the exchange field, and then this translation between GLBX and the individual venues was needed in a few places. So I ended up walking that back which has now pushed the complexity back out to the Interactive Brokers adapter.

I agree with @rterbush, that we should avoid layering on even more complexity with additional configuration settings users have to be concerned with. Probably the way the initial Databento adapter implementation was heading was along the right lines, where proper MIC codes are used for the venues -- which would then only need a simple XCME -> CME type mapping for Interactive Brokers.

There's some additional context with IB I have to catch up on, but do we at least agree that for traditional assets we should use the official ISO 10383 MIC codes as the venue identifier? (e.g., XNAS rather than NASDAQ).

[edit] @rsmb7z and I did have several conversations about this months ago. I think this is when we settled on that initial Databento implementation. But I'm not sure we've covered this since I walked that back? My intuition is that the Interactive Brokers adapter probably shouldn't be responsible for the translation from GLBX -> CME?

rsmb7z commented 1 month ago

There's some additional context with IB I have to catch up on, but do we at least agree that for traditional assets we should use the official ISO 10383 MIC codes as the venue identifier? (e.g., XNAS rather than NASDAQ).

Yes, I agree, especially when using Databento+IB together. However, if someone is using only IB, they can continue to use IB symbology, i.e. AAPL.NASDAQ.

[edit] @rsmb7z and I did have several conversations about this months ago. I think this is when we settled on that initial Databento implementation. But I'm not sure we've covered this since I walked that back? My intuition is that the Interactive Brokers adapter probably shouldn't be responsible for the translation from GLBX -> CME?

Yes, the background is well covered. I think the adapter should handle symbols where there is no ambiguity and can resolve a single unique instrument. Let's include this in the initial draft and get community feedback. Since this will be optional, it shouldn't impact any existing functionality, and users can still have their own translation for InstrumentId within their strategy.

faysou commented 1 month ago

@cjdsellers from my short experience as a user with databento and nautilus, I think the definition has to be downloaded anyway so the system works properly, especially when using options. So I would assume that someone using databento would as well have access to the definition file.

I've worked on some helper functions to make it easy to always download data and defintions from databento and interact with Nautilus by saving them to a catalog as well. Here's the code below, maybe it could be included in Nautilus at some point somewhere as it makes it quite easy to handle databento data.

from datetime import datetime, timedelta
from pathlib import Path

import databento as db
from nautilus_trader.adapters.databento.loaders import DatabentoDataLoader
from nautilus_trader.persistence.catalog import ParquetDataCatalog

DATA_PATH = Path("~/databento_data").expanduser()

databento_api_key = "db-xxxx"
client = db.Historical(key=databento_api_key)

def get_next_day(date_str):
    date_format = "%Y-%m-%d"
    date = datetime.strptime(date_str, date_format)
    next_day = date + timedelta(days=1)

    return next_day.strftime(date_format)

def get_databento_data(symbols, start, end, schema='ohlcv-1m', subfolder='', file_prefix='', dataset='GLBX.MDP3',
                       path=DATA_PATH, save_to_catalog=True):
    used_path = path / subfolder

    if not used_path.exists():
        used_path.mkdir(parents=True, exist_ok=True)

    # downloading and saving defintion
    definition_date = start.split('T')[0]
    end_date = end.split('T')[0]
    used_end_date = end_date if definition_date != end_date else get_next_day(definition_date)

    used_file_prefix = file_prefix + ('_' if file_prefix != '' else '')
    definition_file_name = used_file_prefix + "definition.dbn.zst"
    definition_file = used_path / definition_file_name

    if not definition_file.exists():
        definition = client.timeseries.get_range(
            dataset=dataset,
            schema='definition',
            symbols=symbols,
            start=definition_date,
            end=used_end_date,
            path=definition_file
        )
    else:
        definition = load_databento_data(definition_file)

    # downloading and saving data
    data_file_name = f"{used_file_prefix}{schema}_{start}_{end}.dbn.zst"
    data_file = used_path / data_file_name

    if not data_file.exists():
        data = client.timeseries.get_range(
            dataset=dataset,
            schema=schema,
            symbols=symbols,
            start=start,
            end=end,
            path=data_file
        )
    else:
        data = load_databento_data(data_file)

    result = dict(symbols=symbols, dataset=dataset, schema=schema,
                  start=start, end=end, path=used_path, file_prefix=file_prefix,
                  definition_file=definition_file, data_file=data_file,
                  definition=definition, data=data)

    if save_to_catalog:
        catalog_data = save_data_to_catalog(definition_file, data_file, subfolder, path)
        result = {**result, **catalog_data}

    return result

def save_data_to_catalog(definition_file, data_file, subfolder='', path=DATA_PATH):
    catalog = load_catalog(subfolder, path)

    loader = DatabentoDataLoader()
    nautilus_definition = loader.from_dbn_file(definition_file, as_legacy_cython=True)
    nautilus_data = loader.from_dbn_file(data_file, as_legacy_cython=False)

    catalog.write_data(nautilus_definition + nautilus_data)

    return dict(catalog=catalog, nautilus_definition=nautilus_definition, nautilus_data=nautilus_data)

def load_catalog(subfolder='', path=DATA_PATH):
    used_path = path / subfolder

    if not used_path.exists():
        used_path.mkdir()

    return ParquetDataCatalog(used_path)

def query_catalog(catalog, data_type='bars', **kwargs):
    if data_type == 'bars':
        return catalog.bars(**kwargs)
    elif data_type == 'ticks':
        return catalog.quote_ticks(**kwargs)
    elif data_type == 'instruments':
        return catalog.instruments(**kwargs)
    elif data_type == 'custom':
        return catalog.custom_data(**kwargs)

def save_databento_data(data, file):
    return data.to_file(file)

def load_databento_data(file):
    return db.DBNStore.from_file(file)

And an example:

start = '2024-05-09T10:00'
end = '2024-05-09T10:05'

test_folder = '20240720_ES_Test'

#Note: the file_prefix allows to do similar data requests without file conflicts as if a file already exists no data request is done
option_symbols = ['ESM4 P5230', 'ESM4 P5250']
symbols_data1 = get_databento_data(option_symbols, start, end, schema='mbp-1', subfolder=test_folder, file_prefix='options')

future_symbols = ['ESM4']
symbols_data2 = get_databento_data(future_symbols, start, end, schema='mbp-1', subfolder=test_folder, file_prefix='futures')

future_symbols = ['ESM4']
symbols_data3 = get_databento_data(future_symbols, start, end, schema='ohlcv-1m', subfolder=test_folder, file_prefix='futures')

catalog = load_catalog(test_folder)
query_catalog(catalog, 'ticks', instrument_ids=['ESM4.GLBX'])
catalog.instruments(instrument_ids=['ESM4 P5250.GLBX'])
cjdsellers commented 1 month ago

Yes, I agree, especially when using Databento+IB together. However, if someone is using only IB, they can continue to use IB symbology, i.e. AAPL.NASDAQ.

@rsmb7z that's a good point I missed, some users will still want to use Interactive Brokers naming conventions - and since this is already working now with great effort, then it should continue to work and be an option.

@faysou thanks for the suggested solution including code. I think ideally we'd want a solution which didn't always require a data catalog -- so that a live trading node didn't always need access to a populated catalog, and BacktestEngine users aren't forced into needing a catalog for GLBX venue translations to work.

Is someone able to point me in the direction of the IB docs for the CME venues? would be appreciated :pray:.

faysou commented 1 month ago

It seems that supporting universal symbols across data providers should work, for example using the convention that @cjdsellers mentioned for venues. And each market adapter then is responsible for translating to its specificities.

For databento having a higher venue granularity than GLBX would also allow higher granularity of portfolio functions related to exposures.

rsmb7z commented 1 month ago

Is someone able to point me in the direction of the IB docs for the CME venues?

@cjdsellers, here you can find the list of exchanges covered by IB worldwide. https://www.interactivebrokers.com/en/trading/products-exchanges.php

faysou commented 1 month ago

As an example the symbols for options between databento and IB are currently quite different, for example 'ESM4 P5230.GLBX' in databento and 'ESU24P5550.CME' in IB. So there needs to be a choice for a universal nautilus convention.

anegrean commented 1 month ago

Hi! I'm very much interested in solving this issue as well. Please let me know if I can help in any way.