microsoft / msticpy

Microsoft Threat Intelligence Security Tools
Other
1.72k stars 310 forks source link

Potential Logic Error #761

Closed pjain90 closed 1 month ago

pjain90 commented 3 months ago

https://github.com/microsoft/msticpy/blob/515b0050e5e771adb076dff4f0c94d5211a80bad/msticpy/data/core/query_provider_connections_mixin.py#L415

Issue 1:

 if (ranges[-1][1] - end) < (split_delta / 10):
        ranges[-1] = ranges[-1][0], end

Should be end - ranges[-1][1] as end will always be larger or equal to the last element in the range.

Ex. For a start time of 2024-03-24 00:00:00, end time of 2024-03-24 23:59:59 and a split of 6h the current implementation gives:

Timestamp('2024-03-24 00:00:00+0000', tz='UTC'), Timestamp('2024-03-24 05:59:59.999999999+0000', tz='UTC'), 
Timestamp('2024-03-24 06:00:00+0000', tz='UTC'), Timestamp('2024-03-24 11:59:59.999999999+0000', tz='UTC') 
Timestamp('2024-03-24 12:00:00+0000', tz='UTC'), Timestamp('2024-03-24 23:59:59.999999999+0000', tz='UTC')

Expected outcome:

Timestamp('2024-03-24 00:00:00+0000', tz='UTC'), Timestamp('2024-03-24 05:59:59.999999999+0000', tz='UTC'), 
Timestamp('2024-03-24 06:00:00+0000', tz='UTC'), Timestamp('2024-03-24 11:59:59.999999999+0000', tz='UTC'), 
Timestamp('2024-03-24 12:00:00+0000', tz='UTC'), Timestamp('2024-03-24 17:59:59.999999999+0000', tz='UTC'), 
Timestamp('2024-03-24 18:00:00+0000', tz='UTC'),  Timestamp('2024-03-24 23:59:59.999999999+0000', tz='UTC')

Issue 2:

ranges.append((ranges[-1][0] + pd.Timedelta("1ns"), end))

should be ranges[-1][1] instead of ranges[-1][0] as we are adding a nanosecond to end of last range instead of the beginning.

Ex. Once issue 1 is fixed, we get:

Timestamp('2024-03-24 00:00:00+0000', tz='UTC'), Timestamp('2024-03-24 05:59:59.999999999+0000', tz='UTC'), 
Timestamp('2024-03-24 06:00:00+0000', tz='UTC'), Timestamp('2024-03-24 11:59:59.999999999+0000', tz='UTC'), 
Timestamp('2024-03-24 12:00:00+0000', tz='UTC'), Timestamp('2024-03-24 17:59:59.999999999+0000', tz='UTC'), 
>> Timestamp('2024-03-24 12:00:00.000000001+0000', tz='UTC'), Timestamp('2024-03-24 23:59:59.999999999+0000', tz='UTC')

Expected outcome:

Timestamp('2024-03-24 00:00:00+0000', tz='UTC'), Timestamp('2024-03-24 05:59:59.999999999+0000', tz='UTC'), 
Timestamp('2024-03-24 06:00:00+0000', tz='UTC'), Timestamp('2024-03-24 11:59:59.999999999+0000', tz='UTC'), 
Timestamp('2024-03-24 12:00:00+0000', tz='UTC'), Timestamp('2024-03-24 17:59:59.999999999+0000', tz='UTC'), 
>> Timestamp('2024-03-24 18:00:00+0000', tz='UTC'),  Timestamp('2024-03-24 23:59:59.999999999+0000', tz='UTC')
ianhelle commented 3 months ago

Whoah - you are right.... arithmetic, esp time arithmetic was never a strong point of mine. The existing code works if you specify whole days (like 2024-01-01 00:00:00 - 2024-01-02 00:00:00).

On reflection, I think this if statement is also problematic since adding a little bit to the last slice might be enough to push it over the timeout or data limit - esp at 10% of a delta. I think I'll reduce the delta check to 0.1%, which should be safe but will also avoid querying for a few ns of data.

ianhelle commented 3 months ago

With the updated logic, you get this:

----------------------------------------
from 2021-01-01 00:00:00 to 2021-01-02 00:20:00 delta 0 days 06:00:00
(Timestamp('2021-01-01 00:00:00'), Timestamp('2021-01-01 05:59:59.999999999'))
(Timestamp('2021-01-01 06:00:00'), Timestamp('2021-01-01 11:59:59.999999999'))
(Timestamp('2021-01-01 12:00:00'), Timestamp('2021-01-01 17:59:59.999999999'))
(Timestamp('2021-01-01 18:00:00'), Timestamp('2021-01-01 23:59:59.999999999'))
(Timestamp('2021-01-02 00:00:00'), Timestamp('2021-01-02 00:20:00'))
----------------------------------------
from 2024-03-24 00:00:00 to 2024-03-24 23:59:59 delta 0 days 06:00:00
(Timestamp('2024-03-24 00:00:00'), Timestamp('2024-03-24 05:59:59.999999999'))
(Timestamp('2024-03-24 06:00:00'), Timestamp('2024-03-24 11:59:59.999999999'))
(Timestamp('2024-03-24 12:00:00'), Timestamp('2024-03-24 17:59:59.999999999'))
(Timestamp('2024-03-24 18:00:00'), Timestamp('2024-03-24 23:59:59'))

1000 thanks for spotting this

pjain90 commented 3 months ago

Wow! Yes, working with time is always hard. Has bitten me quite a few times. Thank you for the quick fix Ian! Really appreciate your efforts in maintaining this library and tools.