Open rsmb7z opened 1 month ago
I've had this discussion quite a bit with @rsmb7z ...
Is it possible for us to standardize on the exchange symbology and map to the various data and execution providers symbology under the covers?
For example, I'd much prefer to standardize on CME symbology for working with futures contracts, mapping to whatever symbol needed for IB to execute trades, or whatever symbol Databento needs to pull data. Seems that would stay true to the goal of same code running in backtest or live trading.
Or am I misunderstanding this issue?
Hi @rterbush
The current plan is to ensure that the translation happens seamlessly under the hood, with the IB adapter respecting the Databento symbology. This means the symbology used for the historical dataset provided by Databento will be utilized during backtesting and other processes. Once the use case is implemented, there will be room for further refinement and consolidation as needed.
Some additional background: I had originally implemented the Databento client to use the individual CME venues instead of the umbrella GLBX
venue which Databento are using.
IIRC this resulted in a sharp increase in complexity, any subscription would first require instrument definitions to be available or requested so we could get at the exchange
field, and then this translation between GLBX
and the individual venues was needed in a few places. So I ended up walking that back which has now pushed the complexity back out to the Interactive Brokers adapter.
I agree with @rterbush, that we should avoid layering on even more complexity with additional configuration settings users have to be concerned with. Probably the way the initial Databento adapter implementation was heading was along the right lines, where proper MIC codes are used for the venues -- which would then only need a simple XCME
-> CME
type mapping for Interactive Brokers.
There's some additional context with IB I have to catch up on, but do we at least agree that for traditional assets we should use the official ISO 10383 MIC codes as the venue identifier? (e.g., XNAS
rather than NASDAQ
).
[edit] @rsmb7z and I did have several conversations about this months ago. I think this is when we settled on that initial Databento implementation. But I'm not sure we've covered this since I walked that back? My intuition is that the Interactive Brokers adapter probably shouldn't be responsible for the translation from GLBX
-> CME
?
There's some additional context with IB I have to catch up on, but do we at least agree that for traditional assets we should use the official ISO 10383 MIC codes as the venue identifier? (e.g.,
XNAS
rather thanNASDAQ
).
Yes, I agree, especially when using Databento+IB together. However, if someone is using only IB, they can continue to use IB symbology, i.e. AAPL.NASDAQ
.
[edit] @rsmb7z and I did have several conversations about this months ago. I think this is when we settled on that initial Databento implementation. But I'm not sure we've covered this since I walked that back? My intuition is that the Interactive Brokers adapter probably shouldn't be responsible for the translation from
GLBX
->CME
?
Yes, the background is well covered. I think the adapter should handle symbols where there is no ambiguity and can resolve a single unique instrument. Let's include this in the initial draft and get community feedback. Since this will be optional, it shouldn't impact any existing functionality, and users can still have their own translation for InstrumentId within their strategy.
@cjdsellers from my short experience as a user with databento and nautilus, I think the definition has to be downloaded anyway so the system works properly, especially when using options. So I would assume that someone using databento would as well have access to the definition file.
I've worked on some helper functions to make it easy to always download data and defintions from databento and interact with Nautilus by saving them to a catalog as well. Here's the code below, maybe it could be included in Nautilus at some point somewhere as it makes it quite easy to handle databento data.
from datetime import datetime, timedelta
from pathlib import Path
import databento as db
from nautilus_trader.adapters.databento.loaders import DatabentoDataLoader
from nautilus_trader.persistence.catalog import ParquetDataCatalog
DATA_PATH = Path("~/databento_data").expanduser()
databento_api_key = "db-xxxx"
client = db.Historical(key=databento_api_key)
def get_next_day(date_str):
date_format = "%Y-%m-%d"
date = datetime.strptime(date_str, date_format)
next_day = date + timedelta(days=1)
return next_day.strftime(date_format)
def get_databento_data(symbols, start, end, schema='ohlcv-1m', subfolder='', file_prefix='', dataset='GLBX.MDP3',
path=DATA_PATH, save_to_catalog=True):
used_path = path / subfolder
if not used_path.exists():
used_path.mkdir(parents=True, exist_ok=True)
# downloading and saving defintion
definition_date = start.split('T')[0]
end_date = end.split('T')[0]
used_end_date = end_date if definition_date != end_date else get_next_day(definition_date)
used_file_prefix = file_prefix + ('_' if file_prefix != '' else '')
definition_file_name = used_file_prefix + "definition.dbn.zst"
definition_file = used_path / definition_file_name
if not definition_file.exists():
definition = client.timeseries.get_range(
dataset=dataset,
schema='definition',
symbols=symbols,
start=definition_date,
end=used_end_date,
path=definition_file
)
else:
definition = load_databento_data(definition_file)
# downloading and saving data
data_file_name = f"{used_file_prefix}{schema}_{start}_{end}.dbn.zst"
data_file = used_path / data_file_name
if not data_file.exists():
data = client.timeseries.get_range(
dataset=dataset,
schema=schema,
symbols=symbols,
start=start,
end=end,
path=data_file
)
else:
data = load_databento_data(data_file)
result = dict(symbols=symbols, dataset=dataset, schema=schema,
start=start, end=end, path=used_path, file_prefix=file_prefix,
definition_file=definition_file, data_file=data_file,
definition=definition, data=data)
if save_to_catalog:
catalog_data = save_data_to_catalog(definition_file, data_file, subfolder, path)
result = {**result, **catalog_data}
return result
def save_data_to_catalog(definition_file, data_file, subfolder='', path=DATA_PATH):
catalog = load_catalog(subfolder, path)
loader = DatabentoDataLoader()
nautilus_definition = loader.from_dbn_file(definition_file, as_legacy_cython=True)
nautilus_data = loader.from_dbn_file(data_file, as_legacy_cython=False)
catalog.write_data(nautilus_definition + nautilus_data)
return dict(catalog=catalog, nautilus_definition=nautilus_definition, nautilus_data=nautilus_data)
def load_catalog(subfolder='', path=DATA_PATH):
used_path = path / subfolder
if not used_path.exists():
used_path.mkdir()
return ParquetDataCatalog(used_path)
def query_catalog(catalog, data_type='bars', **kwargs):
if data_type == 'bars':
return catalog.bars(**kwargs)
elif data_type == 'ticks':
return catalog.quote_ticks(**kwargs)
elif data_type == 'instruments':
return catalog.instruments(**kwargs)
elif data_type == 'custom':
return catalog.custom_data(**kwargs)
def save_databento_data(data, file):
return data.to_file(file)
def load_databento_data(file):
return db.DBNStore.from_file(file)
And an example:
start = '2024-05-09T10:00'
end = '2024-05-09T10:05'
test_folder = '20240720_ES_Test'
#Note: the file_prefix allows to do similar data requests without file conflicts as if a file already exists no data request is done
option_symbols = ['ESM4 P5230', 'ESM4 P5250']
symbols_data1 = get_databento_data(option_symbols, start, end, schema='mbp-1', subfolder=test_folder, file_prefix='options')
future_symbols = ['ESM4']
symbols_data2 = get_databento_data(future_symbols, start, end, schema='mbp-1', subfolder=test_folder, file_prefix='futures')
future_symbols = ['ESM4']
symbols_data3 = get_databento_data(future_symbols, start, end, schema='ohlcv-1m', subfolder=test_folder, file_prefix='futures')
catalog = load_catalog(test_folder)
query_catalog(catalog, 'ticks', instrument_ids=['ESM4.GLBX'])
catalog.instruments(instrument_ids=['ESM4 P5250.GLBX'])
Yes, I agree, especially when using Databento+IB together. However, if someone is using only IB, they can continue to use IB symbology, i.e. AAPL.NASDAQ.
@rsmb7z that's a good point I missed, some users will still want to use Interactive Brokers naming conventions - and since this is already working now with great effort, then it should continue to work and be an option.
@faysou thanks for the suggested solution including code. I think ideally we'd want a solution which didn't always require a data catalog -- so that a live trading node didn't always need access to a populated catalog, and BacktestEngine
users aren't forced into needing a catalog for GLBX venue translations to work.
Is someone able to point me in the direction of the IB docs for the CME venues? would be appreciated :pray:.
It seems that supporting universal symbols across data providers should work, for example using the convention that @cjdsellers mentioned for venues. And each market adapter then is responsible for translating to its specificities.
For databento having a higher venue granularity than GLBX would also allow higher granularity of portfolio functions related to exposures.
Is someone able to point me in the direction of the IB docs for the CME venues?
@cjdsellers, here you can find the list of exchanges covered by IB worldwide. https://www.interactivebrokers.com/en/trading/products-exchanges.php
As an example the symbols for options between databento and IB are currently quite different, for example 'ESM4 P5230.GLBX' in databento and 'ESU24P5550.CME' in IB. So there needs to be a choice for a universal nautilus convention.
Hi! I'm very much interested in solving this issue as well. Please let me know if I can help in any way.
Feature Request
Refactor
InteractiveBrokersInstrumentProvider
to accept Databento symbology as an option, while keeping the original Interactive Brokers symbology intact. This will enhance flexibility in symbol management.Requirements
Optional Databento Symbology flag:
Symbology Conversion:
Configuration and Validation:
Testing and Documentation:
Backward Compatibility: