Open dReamix opened 6 months ago
I think your csv file needs some preprocessing before it can be converted to a bin file, with the following caveats.
One thing to keep in mind is to categorize the data by stock code and name the file after the stock code. e.g. SH600000.csv
The time column needs to be converted from 12 hours to 24 hours. e.g. 2010-12-01 14:34:00
When dump_bin you need to use --date_field_name
to specify the time column, --symbol_field_name
to specify the stock code column, use --exclude_fields
to exclude the stock code column and the time column, because qlib will store them in its own way.
Hi there,
New to use Qlib but I did look up my questions online and asked LLM, no solutions so far.
Here are what I am facing:
I have 1min level trading data in more than 10 csv files, each file is over 500MB. All the csv files follow same format, [instrument, time, open, high, low, close, volume, turnover, is_paused]. In this case column 'instrument' saves asset code, so one file will have tons of stock code. Column 'time' saves trading time stamp, e.g. '1/2/2019 9:53:00 AM'.
Problems:
1, All the csv files are in one folder, I tried run 'python dump_bin.py dump_all --csv_path 'csv file folder path' --qlib_dir 'target file path' --symbol_field_name instrument --date_field_name time --include_fields open,high,low,close,volume,turnover,is_paused'.
then the system returned 'concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.'
Is this because of short of memory? (file size too large? )
because I tried to put only one csv file in the folder then the 'python dump_bin.py' worked, partially.
There is a 'day.txt' in calendar folder, but it only save date level data, e.g. '2019-01-02', there is no minute.
Appreciated if anyone could share your advice!