nyurik / timeseriesdb

Automatically exported from code.google.com/p/timeseriesdb
GNU General Public License v3.0
13 stars 6 forks source link

BinCompressedIndexFile #2

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I am playing with your latest timeseriesdb, and am thinking about how to 
implement a BinCompressedIndexFile. Right now I use the BinIndexedFile, and 
sequentially store the ticks for all CME contracts sorted first by contract 
name, then by tick in sequential time based order. I have a separate index file 
which I maintain the offset and count for each contract name to make the 
lookups for a single contract's tick data very fast by day.

To translate this to a compressed type, obviously it would be best if I had a 
different file per contract as your layout implies, but given the number of 
contract, this is not practical. For the compressed files, it would be nice if 
I could seek to an offset within a compressed file and stream a specific number 
of ticks out. Right now, it seems the only way to do this is to store the index 
value inside of each tick, which obviously is a waste of disk space. Is there 
an easier way to setup the BinCompressedSeriesFile to allow this sort of 
indexing?

Original issue reported on code.google.com by kar...@gmail.com on 3 Jan 2012 at 7:04

GoogleCodeExporter commented 9 years ago
You can simply use the BinCompressedSeriesFile - you don't really need an index 
file - instead you can make a complex key (date+contract) and use binary search 
to quickly find it and start streaming from that point on.

Original comment by yuriastrakhan on 3 Jan 2012 at 7:12

GoogleCodeExporter commented 9 years ago
That would work - but would still be O(log n) lookup, when in fact I know the 
index of the ticks I want, and how many of them I want. It would be easiest if 
I query the table by index, but still get the benefit of storing the ticks in a 
compressed format.

Original comment by kar...@gmail.com on 3 Jan 2012 at 7:14

GoogleCodeExporter commented 9 years ago
if you keep the file open, it cashes the searches, effectively building an 
on-the-fly index. I would try this approach and measure the speed compared to 
your current solution - I wouldn't be surprised if the savings on IO with 
compressed data outweighs maintaining a separate index and doing a O(1)+read 
speed.

Original comment by yuriastrakhan on 3 Jan 2012 at 11:23

GoogleCodeExporter commented 9 years ago
wontFixing this issue for now

Original comment by yuriastrakhan on 23 Feb 2012 at 6:26