select_by_index() function implementation

henryennis commented 5 months ago

If I have a dataset of length 600 and I want to get the first 512 rows, I would set the start index to 0 and the end index to 511.

inference_data = select_by_index( data, id_columns=ID_COLUMNS, start_index=0, end_index=511, )

The implementation of select_by_index will return a dataset of length 511 because of a truthy condition problem in the following code and the non inclusivity of slicing from the end [:x]...

if not start_index:
    return group_df.iloc[:end_index,]

if not end_index:
    return group_df.iloc[start_index:,]

wgifford commented 4 months ago

@henryennis The intent was for the indexing to behave like python list indexing:

data = range(1000)
data[0:511]  # len = 511
data[:511] # len = 511

data[0:512] # len = 512
data[:512] # len = 512

henryennis commented 4 months ago

Yeah the only way to find out if it is implemented that way is to look at the source code. Theres also some unexpected behaviour when assigning an index as None or 0 for either start or end.

wgifford commented 2 months ago

I improved the documentation as part of https://github.com/ibm-granite/granite-tsfm/pull/95 to make the functionality clearer.

ibm-granite / granite-tsfm

select_by_index() function implementation #45