CV creates incorrect split of user defined transforms.

When specifying split_start='after_transforms' in CV.fit(), the user defined transforms are not split up correctly. See the graph created by the fit() call in the code below.

It seems like if a user defined transform has presteps then the split location will not be in the right place. This might also effect splitting the transforms given an integer value.

from nimbusml import DataSchema, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import LightGbmRegressor
from nimbusml.model_selection import CV
from nimbusml.preprocessing.missing_values import Indicator, Handler

path = get_dataset("airquality").as_filepath()
schema = DataSchema.read_schema(path)
data = FileDataStream(path, schema)

pipeline_steps = [
    Indicator() << {
        'Ozone_ind': 'Ozone',
        'Solar_R_ind': 'Solar_R'},
    Handler(
        replace_with='Mean') << {
        'Solar_R': 'Solar_R',
        'Ozone': 'Ozone'},
    LightGbmRegressor(
        feature=['Ozone',
                 'Solar_R',
                 'Ozone_ind',
                 'Solar_R_ind',
                 'Temp'],
        label='Wind')]

cv_results = CV(pipeline_steps).fit(data, split_start='after_transforms')

microsoft / NimbusML

CV creates incorrect split of user defined transforms. #409