Currently the result of the interval identifier distinguishes between valid and invalid intervals. All invalid intervals are assigned 0 by definition whereas all valid intervals are enumerated starting with 1. However, in some cases, it is useful to have an enumeration regardless of valid/invalid intervals.
The naive enumeration should be also less computation intensive and could be added as an optional keyword argument, for example enumeration="strict" for invalid/valid intervals and enumeration="simple" for intervals regardless of valid/invalid. Another naming proposal could be mark_invalid=True/False. The simple/naive enumeration does not need to increase in steps of one (1, 2, 3 ...). Any increasing value suffices (like 1, 3, 4, 6 ...).
Another benefit of naive iids is better performance because the re-enumeration from 1 to n with invalids assigned 0 is an expensive computation (especially in spark with addition window functions).
Currently the result of the interval identifier distinguishes between valid and invalid intervals. All invalid intervals are assigned
0
by definition whereas all valid intervals are enumerated starting with1
. However, in some cases, it is useful to have an enumeration regardless of valid/invalid intervals.The
naive
enumeration should be also less computation intensive and could be added as an optional keyword argument, for exampleenumeration="strict"
for invalid/valid intervals andenumeration="simple"
for intervals regardless of valid/invalid. Another naming proposal could bemark_invalid=True/False
. The simple/naive enumeration does not need to increase in steps of one (1, 2, 3 ...). Any increasing value suffices (like 1, 3, 4, 6 ...).Test data example