wrighter / ib-scripts

Python scripts that use the Interactive Brokers TWS API
MIT License
48 stars 19 forks source link

Need to consider timezones in API requests #4

Closed wrighter closed 7 months ago

wrighter commented 1 year ago

16:28:01,328 ibapi.wrapper ERROR ERROR 2 2174 Warning: You submitted request with date-time attributes without explicit time zone. Please switch to use yyyymmdd-hh:mm:ss in UTC or use instrument time zone, like US/Eastern. Implied time zone functionality will be removed in the next API release.

wrighter commented 1 year ago

This is dealt with in the fix for #6, but I am not sure if it's a complete solution. I did limited testing with 'America/Chicago' as my timezone parameter, and the bars looked ok.

pranavlal commented 1 year ago

Hi, The problem persists. When I run the following line, I get the final error as TypeError: Cannot compare tz-naive and tz-aware t

The command I am using is as follows. `python src/download_bars.py -p 4002 --exchange="NSE" --currency="INR" --size "1 hour" --timezone "Asia/Kolkata" --start-date 20100101 --end-date 20230430 WEBELSOLA

`

The script is unusable.

wrighter commented 1 year ago

Can you give your IB api version? I don’t have permissions on this product, but could try to replicate it with the same api version. Of course you could also try to debug it yourself, it’s open source and pull requests are always welcome.

pranavlal commented 1 year ago

Hi, The api version I am using is 10.22.1 I think this is the latest stable api but if it is not, I am happy to upgrade if that will help.

pranavlal commented 1 year ago

Hi, I have tried to run the script with the -d option to generate a debug log. What I am seeing is the download of the data is working. The error occurs after that so I am checking. I have a huge run log file which contains successfull and failed runs but am not sure how to get it to you. I am also checking the code in an attempt to determine what is going wrong.

wrighter commented 1 year ago

Could you just attach the part of the log to this ticket where the error occurs with maybe 30 or so lines around it? You can remove any personal info from the file if you want.

It could be that it's failing when trying to write out the data. One technique for debugging might be to run smaller chunks of data (say a year or a month at a time) to see if those work. It could just be some bad data or an error condition that happens just once in the history.

pranavlal commented 1 year ago

Hi Mat, Here is the log. Mind you, your debug log is without errors. I have generated the below output by using the script command and capturing the output of the terminal. I am using pandas 2.0.1. I am also trying to go back to the version of components listed in the original requirements.txt file. If that works, I'll report back.


Script started on 2023-05-18 06:02:00+05:30 [TERM="xterm-256color" TTY="/dev/pts/2" COLUMNS="120" LINES="30"]
egrep: warning: egrep is obsolescent; using grep -E
]0;pranav@archlinux:~/ib-scripts [?2004h[pranav@archlinux ib-scripts]$ python src/download_bars.py -p 4002 --exchange="NSE" --currency="INR" --size "1 hour" --timezone "Asia/Kolkata" --start-date 20100101 --end-date 20230430 -d --logfile debug.log SBIN
[?2004l
Exception in thread Thread-2 (run):
Traceback (most recent call last):
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/algorithms.py", line 1542, in safe_sort
    sorter = values.argsort()
             ^^^^^^^^^^^^^^^^
  File "pandas/_libs/tslibs/timestamps.pyx", line 383, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
TypeError: Cannot compare tz-naive and tz-aware timestamps

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/tslibs/timestamps.pyx", line 383, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
TypeError: Cannot compare tz-naive and tz-aware timestamps

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/ibapi-10.22.1-py3.11.egg/ibapi/client.py", line 263, in run
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/ibapi-10.22.1-py3.11.egg/ibapi/decoder.py", line 1387, in interpret
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/ibapi-10.22.1-py3.11.egg/ibapi/decoder.py", line 537, in processHistoricalDataMsg
  File "/home/pranav/ib-scripts/src/download_bars.py", line 223, in historicalDataEnd
    self.handle_end(reqId, ts)
  File "/home/pranav/ib-scripts/src/download_bars.py", line 230, in handle_end
    self.save_data(self.requests[rid], bars)
  File "/home/pranav/ib-scripts/src/download_bars.py", line 145, in save_data
    new_bars = combined.groupby("date").last().reset_index()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 2443, in last
    return self._agg_general(
           ^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1422, in _agg_general
    result = self._cython_agg_general(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1507, in _cython_agg_general
    new_mgr = data.grouped_reduce(array_func)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 1506, in grouped_reduce
    applied = blk.apply(func)
              ^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 329, in apply
    result = func(self.values, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1490, in array_func
    result = self.grouper._cython_operation(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 955, in _cython_operation
    cy_op = WrappedCythonOp(kind=kind, how=how, has_dropped_na=self.has_dropped_na)
                                                               ^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 869, in has_dropped_na
    return bool((self.group_info[0] < 0).any())
                 ^^^^^^^^^^^^^^^
  File "pandas/_libs/properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 873, in group_info
    comp_ids, obs_group_ids = self._get_compressed_codes()
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 897, in _get_compressed_codes
    return ping.codes, np.arange(len(ping.group_index), dtype=np.intp)
           ^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/grouper.py", line 671, in codes
    return self._codes_and_uniques[0]
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/grouper.py", line 780, in _codes_and_uniques
    codes, uniques = algorithms.factorize(  # type: ignore[assignment]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/algorithms.py", line 786, in factorize
    uniques, codes = safe_sort(
                     ^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/algorithms.py", line 1551, in safe_sort
    ordered = _sort_mixed(values)
              ^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/algorithms.py", line 1607, in _sort_mixed
    num_argsort = np.argsort(values[num_pos])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<__array_function__ internals>", line 200, in argsort
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 1146, in argsort
    return _wrapfunc(a, 'argsort', axis=axis, kind=kind, order=order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 66, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/tslibs/timestamps.pyx", line 383, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
TypeError: Cannot compare tz-naive and tz-aware timestamps
^CTraceback (most recent call last):
  File "/home/pranav/ib-scripts/src/download_bars.py", line 525, in <module>
    main()
  File "/home/pranav/ib-scripts/src/download_bars.py", line 518, in main
    code = app.wait_done()
           ^^^^^^^^^^^^^^^
  File "/home/pranav/ib-scripts/src/download_bars.py", line 73, in wait_done
    code = self.queue.get()
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/queue.py", line 171, in get
    self.not_empty.wait()
  File "/usr/lib/python3.11/threading.py", line 320, in wait
    waiter.acquire()
KeyboardInterrupt

]0;pranav@archlinux:~/ib-scripts [?2004h[pranav@archlinux ib-scripts]$ exit
[?2004l
exit

Script done on 2023-05-18 06:02:12+05:30 [COMMAND_EXIT_CODE="130"]
pranavlal commented 1 year ago

Hi,

Ok, so I found the problem part of the code but am not sure how to fix it.

See the first two or three lines.

new_bars = combined.groupby("date").last().reset_index()

Exception in thread Thread-2 (run):

Traceback (most recent call last):

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/algorithms.py", line 1739, in safe_sort

sorter = values.argsort()                                                                                           

         ^^^^^^^^^^^^^^^^                                                                                           

File "pandas/_libs/tslibs/timestamps.pyx", line 253, in pandas._libs.tslibs.timestamps._Timestamp.richcmp

TypeError: Cannot compare tz-naive and tz-aware timestamps

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner

self.run()                                                                                                          

File "/usr/lib/python3.11/threading.py", line 975, in run

self._target(*self._args, **self._kwargs)                                                                           

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/ibapi-10.22.1-py3.11.egg/ibapi/client.py", line 263, in run

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/ibapi-10.22.1-py3.11.egg/ibapi/decoder.py", line 1387, in interpret

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/ibapi-10.22.1-py3.11.egg/ibapi/decoder.py", line 537, in processHistoricalDataMsg

File "/home/pranav/ib-scripts/src/download_bars.py", line 223, in historicalDataEnd

self.handle_end(reqId, ts)                                                                                          

File "/home/pranav/ib-scripts/src/download_bars.py", line 230, in handle_end

self.save_data(self.requests[rid], bars)                                                                            

File "/home/pranav/ib-scripts/src/download_bars.py", line 145, in save_data

new_bars = combined.groupby("date").last().reset_index()                                                            

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                          

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 2275, in last

return self._agg_general(                                                                                           

       ^^^^^^^^^^^^^^^^^^                                                                                           

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1515, in _agg_general

result = self._cython_agg_general(                                                                                  

         ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                  

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1606, in _cython_agg_general

res = self._wrap_agged_manager(new_mgr)                                                                             

      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                             

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/generic.py", line 1434, in _wrap_agged_manager

index = self.grouper.result_index                                                                                   

        ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                   

File "pandas/_libs/properties.pyx", line 37, in pandas._libs.properties.CachedProperty.get

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 893, in result_index

return self.groupings[0].result_index.rename(self.names[0])                                                         

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                               

File "pandas/_libs/properties.pyx", line 37, in pandas._libs.properties.CachedProperty.get

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/grouper.py", line 648, in result_index

return self.group_index                                                                                             

       ^^^^^^^^^^^^^^^^                                                                                             

File "pandas/_libs/properties.pyx", line 37, in pandas._libs.properties.CachedProperty.get

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/grouper.py", line 656, in group_index

uniques = self._codes_and_uniques[1]                                                                                

          ^^^^^^^^^^^^^^^^^^^^^^^                                                                                   

File "pandas/_libs/properties.pyx", line 37, in pandas._libs.properties.CachedProperty.get

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/groupby/grouper.py", line 690, in _codes_and_uniques

codes, uniques = algorithms.factorize(                                                                              

                 ^^^^^^^^^^^^^^^^^^^^^                                                                              

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/algorithms.py", line 768, in factorize

uniques, codes = safe_sort(                                                                                         

                 ^^^^^^^^^^                                                                                         

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/algorithms.py", line 1748, in safe_sort

ordered = _sort_mixed(values)                                                                                       

          ^^^^^^^^^^^^^^^^^^^                                                                                       

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/pandas/core/algorithms.py", line 1800, in _sort_mixed

nums = np.sort(values[~str_pos])                                                                                    

       ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                    

File "<__array_function__ internals>", line 180, in sort

File "/home/pranav/.virtualenvs/ibapi/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 1004, in sort

a.sort(axis=axis, kind=kind, order=order)                                                                           

File "pandas/_libs/tslibs/timestamps.pyx", line 253, in pandas._libs.tslibs.timestamps._Timestamp.richcmp

TypeError: Cannot compare tz-naive and tz-aware timestamps

From: wrighter @.> Sent: Thursday, May 18, 2023 5:00 AM To: wrighter/ib-scripts @.> Cc: pranavlal @.>; Comment @.> Subject: Re: [wrighter/ib-scripts] Need to consider timezones in API requests (Issue #4)

Could you just attach the part of the log to this ticket where the error occurs with maybe 30 or so lines around it? You can remove any personal info from the file if you want.

It could be that it's failing when trying to write out the data. One technique for debugging might be to run smaller chunks of data (say a year or a month at a time) to see if those work. It could just be some bad data or an error condition that happens just once in the history.

— Reply to this email directly, view it on GitHub https://github.com/wrighter/ib-scripts/issues/4#issuecomment-1552210789 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCF2A4PNGT75WUVS6DEF2DXGVNQJANCNFSM6AAAAAAQNY6VTQ . You are receiving this because you commented. https://github.com/notifications/beacon/ACCF2A72VFGTWU7NQGCGBFLXGVNQJA5CNFSM6AAAAAAQNY6VTSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS4QTNWK.gif Message ID: @. @.> >

DewinGoh commented 1 year ago

Hi, solved this on my end by changing the --date-format default to a timezone aware format, from:

default="%Y-%m-%d %H:%M:%S"

 to

default="%Y-%m-%dT%H:%M:%SZ"

Not sure if this would help anyone else here. Thanks @wrighter for the convenient scripts!

pranavlal commented 1 year ago

Hi,

Where did you change the default date format?

From: DewinGoh @.> Sent: Sunday, May 21, 2023 4:33 PM To: wrighter/ib-scripts @.> Cc: pranavlal @.>; Comment @.> Subject: Re: [wrighter/ib-scripts] Need to consider timezones in API requests (Issue #4)

Hi, solved this on my end by changing the --date-format default to a timezone aware format, from:

default="%Y-%m-%d %H:%M:%S"



default="%Y-%m-%dT%H:%M:%SZ"

Not sure if this would help anyone else here. Thanks @wrighter for the convenient scripts! 

—
Reply to this email directly, view it on GitHub <https://github.com/wrighter/ib-scripts/issues/4#issuecomment-1556149483> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACCF2A3GLSYU3C5UZARBBX3XHHY6ZANCNFSM6AAAAAAQNY6VTQ> .
You are receiving this because you commented.  <https://github.com/notifications/beacon/ACCF2AY45TTN7QN6W3F6A2LXHHY6ZA5CNFSM6AAAAAAQNY6VTSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS4YD2OW.gif> Message ID: ***@***.*** ***@***.***> >
wrighter commented 1 year ago

@pranavlal you can change it on the command line, it's the --date-format parameter.

I could change the default as well, but I probably won't have time to look at this for a few weeks. I'd like to find an example on my end to reproduce this behavior so I can also verify the fix. I hope the command line option takes care of it for you.

pranavlal commented 1 year ago

Hi,

This has solved my problem. My working commandline is below.

python src/download_bars.py --exchange="NSE" --currency="INR" --max-days --port 4002 --date-format="%Y-%m-%dT%H:%M:%SZ" KOTAKBANK

From: wrighter @.> Sent: Tuesday, May 23, 2023 5:41 PM To: wrighter/ib-scripts @.> Cc: pranavlal @.>; Mention @.> Subject: Re: [wrighter/ib-scripts] Need to consider timezones in API requests (Issue #4)

@pranavlal https://github.com/pranavlal you can change it on the command line, it's the --date-format parameter.

I could change the default as well, but I probably won't have time to look at this for a few weeks. I'd like to find an example on my end to reproduce this behavior so I can also verify the fix. I hope the command line option takes care of it for you.

— Reply to this email directly, view it on GitHub https://github.com/wrighter/ib-scripts/issues/4#issuecomment-1559165344 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCF2A64DI6O5GVS2THEMQDXHSSO3ANCNFSM6AAAAAAQNY6VTQ . You are receiving this because you were mentioned. https://github.com/notifications/beacon/ACCF2AYQPE6RLHPVDB3WTDDXHSSO3A5CNFSM6AAAAAAQNY6VTSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS45342A.gif Message ID: @. @.> >

wrighter commented 1 year ago

@pranavlal that's good to hear. I'll leave the issue open for now, I might either change the default or see if there's a better fix if I can get around to reproducing this.

Pl0414141 commented 1 year ago

Hi,

i'm using:

/home/trade/PycharmProjects/ibdata/venv/bin/python /home/trade/PycharmProjects/ibdata/ib-scripts/src/download_bars.py --max-days --date-format="%Y-%m-%dT%H:%M:%SZ" --size '1 hour' --security-type CONTFUT --exchange CME NQ

And the output error is the same:

File "/home/trade/PycharmProjects/ibdata/venv/lib/python3.10/site-packages/pandas/core/groupby/ops.py", line 897, in _get_compressed_codes return ping.codes, np.arange(len(ping.group_index), dtype=np.intp) File "/home/trade/PycharmProjects/ibdata/venv/lib/python3.10/site-packages/pandas/core/groupby/grouper.py", line 671, in codes return self._codes_and_uniques[0] File "pandas/_libs/properties.pyx", line 36, in pandas._libs.properties.CachedProperty.get File "/home/trade/PycharmProjects/ibdata/venv/lib/python3.10/site-packages/pandas/core/groupby/grouper.py", line 780, in _codes_and_uniques codes, uniques = algorithms.factorize( # type: ignore[assignment] File "/home/trade/PycharmProjects/ibdata/venv/lib/python3.10/site-packages/pandas/core/algorithms.py", line 787, in factorize uniques, codes = safe_sort( File "/home/trade/PycharmProjects/ibdata/venv/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1552, in safe_sort ordered = _sort_mixed(values) File "/home/trade/PycharmProjects/ibdata/venv/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1608, in _sort_mixed num_argsort = np.argsort(values[num_pos]) File "/home/trade/PycharmProjects/ibdata/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 1133, in argsort return _wrapfunc(a, 'argsort', axis=axis, kind=kind, order=order) File "/home/trade/PycharmProjects/ibdata/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 68, in _wrapfunc return _wrapit(obj, method, *args, *kwds) File "/home/trade/PycharmProjects/ibdata/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 45, in _wrapit result = getattr(asarray(obj), method)(args, **kwds) File "pandas/_libs/tslibs/timestamps.pyx", line 383, in pandas._libs.tslibs.timestamps._Timestamp.richcmp TypeError: Cannot compare tz-naive and tz-aware timestamps

roguespyder commented 11 months ago

I was able to get it to work on python 3.11 with modifying the code a smidge. Not sure it's "correct", but it works to git rid of the "Cannot compare tz-naive and tz-aware timestamps" see tz_localize additions below

def save_data(self, contract: Contract, bars: BarDataList) -> None:
        data = [
            [b.date, b.open, b.high, b.low, b.close, b.volume, b.barCount, b.wap]
            for b in bars
        ]
        df = pd.DataFrame(
            data,
            columns=[
                "date",
                "open",
                "high",
                "low",
                "close",
                "volume",
                "barCount",
                "wap",
            ],
        )

        df["date"] = df["date"].apply(self._parse_timestamp)
        df['date'] = df['date'].dt.tz_localize(None)

        if self.daily_files():
            # just overwrite whatever is there
            path = f"{make_download_path(self.args)}/{contract.symbol}.csv"
            df.to_csv(path, index=False, date_format=self.args.date_format)
        else:
            # depending on how things moved along, we'll have data
            # from different dates.
            for d in df["date"].dt.date.unique():
                path = os.path.sep.join(
                    [
                        make_download_path(self.args, contract),
                        f"{d.strftime('%Y%m%d')}.csv",
                    ]
                )
                new_bars = df.loc[df["date"].dt.date == d]
                # if a file exists, let's attempt to load it, merge our data in, and then save it
                if os.path.exists(path):
                    existing_bars = pd.read_csv(path, parse_dates=["date"])
                    existing_bars['date'] = existing_bars['date'].dt.tz_localize(None)
                    combined = pd.concat([existing_bars, new_bars])
                    new_bars = combined.groupby("date").last().reset_index()

                new_bars.to_csv(path, index=False, date_format=self.args.date_format)
wrighter commented 7 months ago

The issue here is when files are saved, then merged with another run and attempted to be saved again. If you write out data without a timezone in your date format (like the default format in the code), it will not get parsed as being a timezone aware date, and the two data frames will be merged and then the groupby fails.

The reason this groupby exists in the first place is so that if you run a partial file you don't end up deleting other parts of the file.

The way I was able to get it to run properly is to use the %z format in my date format. This is the utc offset for that timestamp. If you use %Z, it will work if there is a valid name for the timezone, but when the file is written out it's writing it with the offset, not the timezone name. When that file is parsed, it ends up not having a name, so the next write will put it out without a timezone.

If you use %z, then the offset gets carried along with multiple writes.

In reality, it's probably always best to store your files in UTC, but many end users probably just prefer these to be easy to visually inspect and don't want to deal with them.

For now, I'll change the default dateformat to include '%z'. The side effect here for some users is if they already downloaded lots of data, they still won't be able to merge the two results. I think the correct way to handle this long term is users should store their data with a UTC timestamp and do their timezone conversions in their downstream apps, but for most users, that may be a heavy load to lift. Maybe a utility script to convert existing data that has no timezone and rewrite it with a utc offset would be a nice addition.