nkaz001 / hftbacktest

A high-frequency trading and market-making backtesting and trading bot in Python and Rust, which accounts for limit orders, queue positions, and latencies, utilizing full tick data for trades and order books, with real-world crypto market-making examples for Binance Futures
MIT License
2.02k stars 397 forks source link

Tardis conversion wrong schema #160

Closed volemont closed 4 days ago

volemont commented 1 week ago

I got the following error while converting Tardis data:

In [18]: 
Reading binance-futures_incremental_book_L2_YFIUSDT_20230523.csv.gz
could not parse `56.7` as dtype `i64` at column 'price' (column number 7)

The current offset in the file is 134208 bytes.

You might want to try:
- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),
- specifying correct dtype with the `dtypes` argument
- setting `ignore_errors` to `True`,
- adding `56.7` to the `null_values` list.

Original error: ```remaining bytes non-empty

I defined schemas in https://github.com/nkaz001/hftbacktest/blob/415f81b3c910572ce8de0934b24e73a52daf9884/py-hftbacktest/hftbacktest/data/utils/tardis.py#L85-L87 to work around the issue:

        if "trades" in file:
            schema = {
                'exchange': pl.String,
                'symbol': pl.String,
                'timestamp': pl.Int64,
                'local_timestamp': pl.Int64,
                'id': pl.UInt64,
                'side': pl.String,
                'price': pl.Float64,
                'amount': pl.Float64,
            }
        elif "incremental_book_L2" in file:
            schema = {
                'exchange': pl.String,
                'symbol': pl.String,
                'timestamp': pl.Int64,
                'local_timestamp': pl.Int64,
                'is_snapshot': pl.Boolean,
                'side': pl.String,
                'price': pl.Float64,
                'amount': pl.Float64,
            }
        else:
            raise ValueError(f"Unknown file type: {file}")
        print('Reading %s' % file)
        df = pl.read_csv(file, schema=schema)

Maybe there is a more elegant way to solve the problem.

volemont commented 4 days ago

Thanks for fixing!