ML system testing concepts

yeomko22 commented 2 years ago

why test?

confidence of the system을 측정할 수 있어야 한다.
moitoring historic system behavior를 분석해라

predicticting reliability

대부분의 소프트웨어 시스템은 변화한다.
이 떄, 변화가 시스템에 어떠한 영향을 줄 지 예측을 할 수 있어야 한다.

functionality

confidence that functionality remains unchanged
즉, 테스트란 우리 시스템이 우리가 기대한 대로 동작하는 지를 보여주는 방식이다. 설령 시스템에 변화가 있다 하더라도.

yeomko22 commented 2 years ago

Testing ML Systems

ML system은 더 세심하게 테스트되어야 한다. 왜? rules들이 더 모호하게 defined 되었기 때문이다.

Key testing principles for ML: pre-deployemnt

use a schema for features
model specification test: model config 변경은 unit test가 필요하다.
validate model quality: sudden degradation, slow degradation 테스트를 해야함
test input feature code
training is reproducible: random seed 고정 등
integration test the pipeline

스크린샷 2021-09-21 오후 8 15 59

yeomko22 commented 2 years ago

testing theory

스크린샷 2021-09-21 오후 8 17 43

unit test: 단일 로직, 단일 클래스에 대한 테스트
integration test: assembled component에 대한 테스트
system test: end to end test

How much testing?

prioritise the code base?
what is mission critical?
test reduce uncertainty about your system?

yeomko22 commented 2 years ago

test data schema

iris_schema = {
    'sepal length': {
        'range': {
            'min': 4.0,  # determined by looking at the dataframe .describe() method
            'max': 8.0
        },
        'dtype': float,
    },
    'sepal width': {
        'range': {
            'min': 1.0,
            'max': 5.0
        },
        'dtype': float,
    },
    'petal length': {
        'range': {
            'min': 1.0,
            'max': 7.0
        },
        'dtype': float,
    },
    'petal width': {
        'range': {
            'min': 0.1,
            'max': 3.0
        },
        'dtype': float,
    }
}

import unittest
import sys

class TestIrisInputData(unittest.TestCase):
    def setUp(self):

        # `setUp` will be run before each test, ensuring that you
        # have a new pipeline to access in your tests. See the 
        # unittest docs if you are unfamiliar with unittest.
        # https://docs.python.org/3/library/unittest.html#unittest.TestCase.setUp
        self.pipeline = SimplePipeline()
        self.pipeline.run_pipeline()

    def test_input_data_ranges(self):
        # get df max and min values for each column
        max_values = self.pipeline.frame.max()
        min_values = self.pipeline.frame.min()

        # loop over each feature (i.e. all 4 column names)
        for feature in self.pipeline.feature_names:

            # use unittest assertions to ensure the max/min values found in the dataset
            # are less than/greater than those expected by the schema max/min.
            self.assertTrue(max_values[feature] <= iris_schema[feature]['range']['max'])
            self.assertTrue(min_values[feature] >= iris_schema[feature]['range']['min'])

    def test_input_data_types(self):
        data_types = self.pipeline.frame.dtypes  # pandas dtypes method

        for feature in self.pipeline.feature_names:
            self.assertEqual(data_types[feature], iris_schema[feature]['dtype'])

schema를 미리 정의해놓고 inpute data가 스키마에서 주어진 min, max를 만족하는지, data type은 만족하는지 테스트를 한다.

yeomko22 commented 2 years ago

testing data engineering

import unittest

class TestIrisDataEngineering(unittest.TestCase):
    def setUp(self):
        self.pipeline = PipelineWithDataEngineering()
        self.pipeline.load_dataset()

    def test_scaler_preprocessing_brings_x_train_mean_near_zero(self):
        # Given
        # convert the dataframe to be a single column with pandas stack
        original_mean = self.pipeline.X_train.stack().mean()

        # When
        self.pipeline.apply_scaler()

        # Then
        # The idea behind StandardScaler is that it will transform your data 
        # to center the distribution at 0 and scale the variance at 1.
        # Therefore we test that the mean has shifted to be less than the original
        # and close to 0 using assertAlmostEqual to check to 3 decimal places:
        # https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmostEqual
        self.assertTrue(original_mean > self.pipeline.X_train.mean())  # X_train is a numpy array at this point.
        self.assertAlmostEqual(self.pipeline.X_train.mean(), 0.0, places=3)
        print(f'Original X train mean: {original_mean}')
        print(f'Transformed X train mean: {self.pipeline.X_train.mean()}')

    def test_scaler_preprocessing_brings_x_train_std_near_one(self):
        # When
        self.pipeline.apply_scaler()

        # Then
        # We also check that the standard deviation is close to 1
        self.assertAlmostEqual(self.pipeline.X_train.std(), 1.0, places=3)
        print(f'Transformed X train standard deviation : {self.pipeline.X_train.std()}')

데이터에 StandardScaler를 적용하는 전처리를 하였다.
이 전처리가 우리가 의도한 대로 적용이 되었는지를 테스트한다.
데이터 전처리 로직을 테스트 하기에 쉽데록 잘 쪼개는 것이 중요하다.
좋은 아이디어는 파이프라인 자체를 클래스로 만든 뒤, load_data까지만 수행하고 그 다음 스텝들을 차례로 수행하면서 테스트를 진행한다.
즉, 데이터 전처리를 수행하는 함수들은 리턴 값을 주지 않고 파이프라인 객체 내의 frame에만 동작을 수행한다. bigdata-platform 프로젝트에서도 이렇게 테스트를 짰더라면 훨씬 편했을 것이다.

yeomko22 / TIL

ML system testing concepts #121

why test?

predicticting reliability

functionality

Testing ML Systems

Key testing principles for ML: pre-deployemnt

testing theory

How much testing?

test data schema

testing data engineering