robcarver17 / pysystemtrade

Systematic Trading in python
GNU General Public License v3.0
2.66k stars 837 forks source link

Migrating capital data from arctic to parquet does not work #1364

Closed emretezel closed 4 weeks ago

emretezel commented 7 months ago

Hi Everyone,

I just pulled the latest master after a long time in order to switch to parquet.

When running the transfer script I am getting the following error, any help greatly appreciated. Do I need to run update capital script before transfer?

Traceback (most recent call last):
  File "/home/pysystemtrade/opt/pysystemtrade/sysdata/parquet/parquet_capital.py", line 36, in get_capital_pd_df_for_strategy
    pd_df = self.parquet.read_data_given_data_type_and_identifier(
  File "/home/pysystemtrade/opt/pysystemtrade/sysdata/parquet/parquet_access.py", line 54, in read_data_given_data_type_and_identifier
    return pd.read_parquet(filename)
  File "/home/pysystemtrade/anaconda3/envs/pysystemtrade-user/lib/python3.10/site-packages/pandas/io/parquet.py", line 503, in read_parquet
    return impl.read(
  File "/home/pysystemtrade/anaconda3/envs/pysystemtrade-user/lib/python3.10/site-packages/pandas/io/parquet.py", line 244, in read
    path_or_handle, handles, kwargs["filesystem"] = _get_path_or_handle(
  File "/home/pysystemtrade/anaconda3/envs/pysystemtrade-user/lib/python3.10/site-packages/pandas/io/parquet.py", line 102, in _get_path_or_handle
    handles = get_handle(
  File "/home/pysystemtrade/anaconda3/envs/pysystemtrade-user/lib/python3.10/site-packages/pandas/io/common.py", line 865, in get_handle
    handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/home/pysystemtrade/data/parquet/capital/__global_capital.parquet'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pysystemtrade/opt/pysystemtrade/sysinit/transfer/backup_arctic_to_parquet.py", line 548, in <module>
    backup_arctic_to_parquet()
  File "/home/pysystemtrade/opt/pysystemtrade/sysinit/transfer/backup_arctic_to_parquet.py", line 95, in backup_arctic_to_parquet
    backup_capital(backup_data)
  File "/home/pysystemtrade/opt/pysystemtrade/sysinit/transfer/backup_arctic_to_parquet.py", line 451, in backup_capital
    parquet_data = data.parquet_capital.get_capital_pd_df_for_strategy(
  File "/home/pysystemtrade/opt/pysystemtrade/sysdata/parquet/parquet_capital.py", line 40, in get_capital_pd_df_for_strategy
    raise missingData(
syscore.exceptions.missingData: Unable to get capital data from parquet for strategy __global_capital
emretezel commented 7 months ago

I have added a zero size empty __global_capital.parquet file under data/parquet/capital but now getting a different error


 File "/home/pysystemtrade/opt/pysystemtrade/sysdata/parquet/parquet_capital.py", line 36, in get_capital_pd_df_for_strategy
    pd_df = self.parquet.read_data_given_data_type_and_identifier(
  File "/home/pysystemtrade/opt/pysystemtrade/sysdata/parquet/parquet_access.py", line 54, in read_data_given_data_type_and_identifier
    return pd.read_parquet(filename)
  File "/home/pysystemtrade/anaconda3/envs/pysystemtrade-user/lib/python3.10/site-packages/pandas/io/parquet.py", line 670, in read_parquet
    return impl.read(
  File "/home/pysystemtrade/anaconda3/envs/pysystemtrade-user/lib/python3.10/site-packages/pandas/io/parquet.py", line 272, in read
    pa_table = self.api.parquet.read_table(
  File "/home/pysystemtrade/anaconda3/envs/pysystemtrade-user/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 1776, in read_table
    dataset = ParquetDataset(
  File "/home/pysystemtrade/anaconda3/envs/pysystemtrade-user/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 1343, in __init__
    [fragment], schema=schema or fragment.physical_schema,
  File "pyarrow/_dataset.pyx", line 1367, in pyarrow._dataset.Fragment.physical_schema.__get__
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet file size is 0 bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pysystemtrade/opt/pysystemtrade/sysinit/transfer/backup_arctic_to_parquet.py", line 548, in <module>
    backup_arctic_to_parquet()
  File "/home/pysystemtrade/opt/pysystemtrade/sysinit/transfer/backup_arctic_to_parquet.py", line 95, in backup_arctic_to_parquet
    backup_capital(backup_data)
  File "/home/pysystemtrade/opt/pysystemtrade/sysinit/transfer/backup_arctic_to_parquet.py", line 451, in backup_capital
    parquet_data = data.parquet_capital.get_capital_pd_df_for_strategy(
  File "/home/pysystemtrade/opt/pysystemtrade/sysdata/parquet/parquet_capital.py", line 40, in get_capital_pd_df_for_strategy
    raise missingData(
syscore.exceptions.missingData: Unable to get capital data from parquet for strategy __global_capital
``
emretezel commented 7 months ago

After adding the empty missing file, ran interactive update capital manual and getting the following error

(pysystemtrade-user) pysystemtrade@emre-OptiPlex-3080:~/opt/pysystemtrade/sysinit/transfer$ python backup_arctic_to_parquet.py 
Configuring sim logging
2024-03-29 16:17:10 DEBUG config {'type': 'config', 'stage': 'config'} Adding config defaults
2024-03-29 16:17:10 DEBUG backup_arctic_to_parquet Dumping from arctic, mongo to parquet files
Do futures contract prices?n
FX?n
Multiple prices?n
Adjusted prices?n
Strategy positions?n
Contract positions?n
Capital?y
Traceback (most recent call last):
  File "/home/pysystemtrade/opt/pysystemtrade/sysinit/transfer/backup_arctic_to_parquet.py", line 548, in <module>
    backup_arctic_to_parquet()
  File "/home/pysystemtrade/opt/pysystemtrade/sysinit/transfer/backup_arctic_to_parquet.py", line 95, in backup_arctic_to_parquet
    backup_capital(backup_data)
  File "/home/pysystemtrade/opt/pysystemtrade/sysinit/transfer/backup_arctic_to_parquet.py", line 454, in backup_capital
    if len(parquet_data) > strategy_capital_data:
  File "/home/pysystemtrade/anaconda3/envs/pysystemtrade-user/lib/python3.10/site-packages/pandas/core/generic.py", line 1519, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
PurpleHazeIan commented 7 months ago

Hit the same problem during my own migration. Line 454 needs to read 'if len(parquet_data) > len (strategy_capital_data):'

emretezel commented 7 months ago

Would it be possible for a patch to be committed to the master branch?

Thank you, Emre

emretezel commented 2 months ago

Can I please ask whether anyone has recently successfully managed to switch from arctic to parquet? Has the above error been fixed?

Is there actually a need to switch to parquet? Could the latest code still be used with arctic?

Thank you.

PurpleHazeIan commented 2 months ago

Hi,

I did make this transition successfully earlier this year. So it can be done. I didn't make detailed notes as it was a one-off process, but this is what I recall …

I used it as an opportunity to clean up the python environment and get rid of the down-level numpy/pandas/etc. which were required by arctic. I ran into two problems.

a/ There is a problem with one of the most recent commits, on Windows, because files are introduced with an asterisk in their names, not permitted on Windows. I worked round this by only fetching up to the commit with SHA=05184ea. (I am now effectively blocked there, but as there has been little activity for several months now this is not yet an actual problem. It is discussed on #1395. Other Operating Systems are available.)

b/ With the up-to-date code in a clean environment without arctic, I couldn't run the arctic-parquet migration code, because I didn't have arctic any more (it seems obvious when written down). Installing arctic into the clean environment made no sense, so I went back to my previous set-up, installed pyarrow, and ran the code there to create the parquet files.

In creating the parquet files, a couple of code errors occurred, as you have found. I fixed these manually, taking the view that, as a one-off process, it didn't matter. I didn't submit them to git, because I am not a git-competent person.

Since then the code has just worked fine for me. I have a few local mods, and use Spread Bets so different sysbrokers code (but have never actually turned on automatic trading, just placing trades manually).

I think the code for arctic remains and should in principle work, with configuration changes to select the arctic classes. See Rob's instructions in #1290 "IF YOU DO NOT WANT TO USE PARQUET BUT WANT TO GET THE LATEST COMMIT".

bug-or-feature commented 4 weeks ago

Fixed by #1423