Closed zqhxuyuan closed 8 years ago
Sparkey uses two files, a log file and an index file. The log file can be reopened and appended to. The index file can not be appended but it can be rebuilt using a log file.
I don't see any problems with reading though your input once and maintaining a set of active writers. Each active writer requires a small amount of memory and a file descriptor. If you only need around 100 or even 1000 writers or so this should not be a problem at all.
You could also open, append and close a writer for each entry. It will work fine, but it might be less efficient.
You might not need to split the file at all. The limiting factor is the memory required to build the index and the index requires about 20 bytes per entry. For quick random queries you would want to fit everything in RAM, which would likely not be possible if everything is stored in a single file.
After creating indexFile SparkeyWriter and close it. Am I able to read this same indexFile and append new data to it? I have a scene which source data come from many large file. I read one by one large file and write to several different indexFile. the new large file should append data to existing indexfile.
for example . file1:
file2:
finally there are 3 indexFile: ab.spi, ac.spi, cd.spi because when query
abssss
, I only query ab.spi. when querycdssss
only query cd.spi.one way is keep SparkeyWriter in memory by a map like
and read all original file
just one time
: read line, substring first two char ,get corresponding SparkeyWriter from map, and write data to this index.as must get all file and keep SparkeyWriter untill all work done seems too slow,also may be memory insufficient. so I want to know does sparkey support open a store for writing subsequent times
As I check paldb project by linkedin, It says :
so I want to know sparkey support this feature?
or may be I don't need split index file: just pull all 100 billion data is sufficient fast query support by sparkey?