Closed LazyTrader closed 4 years ago
pls show only copy passable pandas in the example - or report to the other package
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
I have 2 dataframes containing financial tick data from 17/04/2020 16:00 - 17/07/2020 23:55. Calling print on them results in the outputs listed below:
print(Asset1Ticks)
print(Asset2Ticks)
bid ask mid
time_msc
2020-04-17 16:00:00.060 1.24828 1.24836 1.24832
2020-04-17 16:00:00.871 1.24828 1.24835 1.24832
2020-04-17 16:00:01.780 1.24827 1.24835 1.24831
2020-04-17 16:00:03.467 1.24825 1.24835 1.24830
2020-04-17 16:00:03.471 1.24826 1.24833 1.24830
... ... ...
2020-07-17 23:54:58.376 1.25669 1.25694 1.25682
2020-07-17 23:54:58.484 1.25666 1.25691 1.25678
2020-07-17 23:54:58.581 1.25666 1.25690 1.25678
2020-07-17 23:54:59.017 1.25670 1.25696 1.25683
2020-07-17 23:54:59.110 1.25671 1.25696 1.25684
[6278152 rows x 3 columns]
bid ask mid
time_msc
2020-04-17 16:00:00.148 1.52730 1.52755 1.52742
2020-04-17 16:00:00.334 1.52730 1.52758 1.52744
2020-04-17 16:00:00.540 1.52734 1.52760 1.52747
2020-04-17 16:00:00.845 1.52735 1.52760 1.52748
2020-04-17 16:00:00.936 1.52741 1.52760 1.52750
... ... ...
2020-07-17 23:54:58.437 1.55202 1.55241 1.55222
2020-07-17 23:54:58.546 1.55198 1.55237 1.55218
2020-07-17 23:54:58.643 1.55201 1.55239 1.55220
2020-07-17 23:54:58.749 1.55196 1.55234 1.55215
2020-07-17 23:54:59.939 1.55189 1.55233 1.55211
[6542489 rows x 3 columns]
However after resampling with the "1Min" offset, both of them have a different number of Nan values.
print(Asset1Ticks["mid"].resample("1Min").ohlc().close.isna().sum())
print(Asset2Ticks["mid"].resample("1Min").ohlc().close.isna().sum())
37583
37576
Both dataframes have the same datetime ranges but somehow, resampling them produces different numbers of Nan values for each.
This inconsistency is only observed with the "1Min" resampling offset. There are no inconsistencies for the other resampling offsets.
Since both financial instruments start and stop trading at the same time and their dataframes both have the same datetime ranges, the number of Nan values post resampling should be the same.
pd.show_versions()
They may have different number of nan values if one stopped trading early on some days. IOW there are some 1 minute intervals in one asset that have no trades reported while the other does have a trade in that interval.
True. Closing this issue.
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Attached below is my code for extracting processing financial tick data from the MetaTrader5 application:
Problem description
After resampling the collected tick data with the "1Min" offset, the resampled data for 2 different instruments have a different number of candles even when their time indexes cover the same date range.
Additional Observations
The other offsets do not have this inconsistent behaviour, the issue only happens with the "1Min" resample offset.
Output of
pd.show_versions()