sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.1k stars 701 forks source link

[Discussion] Remove some global variables in attribute.go #2493

Open sneaxiy opened 4 years ago

sneaxiy commented 4 years ago

This issue further addresses: https://github.com/sql-machine-learning/playground/issues/42#issuecomment-645666536

Where are the global variables

How to remove the exported global map which stores the parameter docs of models

Some of these global variables share the same information. For example:

Note that PremadeModelParamsDocs, OptimizerParameterDocs and XGBoostObjectiveDocs are also used in cli prompt suggestion (see 1, 2, 3). So we cannot remove these 3 variables or hide them.

How to remove exported parameter type definitions

https://github.com/sql-machine-learning/playground/issues/42#issue-640787739 suggests to enhance compile time data checking using the following ways.

var distributedTrainingAttributes = attribute.Dictionary{}.
    Int("train.num_ps", 0, "", nil).
    Int("train.num_workers", 1, "", nil).
    Int("train.worker_cpu", 400, "", nil)

In this way, attribute.Description, attribute.Int, attribute.Float, etc can be hidden. The signature of Dictionary.Int would be:

func (d Dictionary) Int(name string, value int, doc string, checker func(int) error)

The only concern of this method is that we cannot support nil default value. Some of the models may have attributes with nil default values. For example, the default value of num_class in XGBoost model is nil(see here), because only multi-class (>2) classification models need num_class while the other models do not need num_class. And once num_class is provided in SQL WITH statement, it must be an integer number, so the data type of num_class attribute should be int. The meaning of nil default value in SQLFlow is:

We can enhance this method to support nil default value.

var dict = Dictionary{}. Int("num_class", optional.Int{}, "doc1", checker1). // default value is nil Int("attr_with_default_value", optional.NewInt(0), "doc2", checker2) // default value is 0

wangkuiyi commented 4 years ago

@sneaxiy This issue mixed up two topics -- (1) removing global variables and (2) redesign the API. Let us keep (2) at https://github.com/sql-machine-learning/playground/issues/42 and make this issue focusing on (1).

For (1), I don't see a conclusion -- can we move some of the global variables or not? and why?

I guess the lack of a conclusion is (partly) due to an incomplete overview of the source code. For example, ModelParameterJSON contains attribute names and explanations. attribute.Dictionary also contains such information. Why do we need redundant form of the same piece of information in our code? And, what is the relationship between extract_docstring.py, the Go global variables, and cmd/docgen.go?

sneaxiy commented 4 years ago

@wangkuiyi The conclusion is that we should:

The reasons are as follows:

https://github.com/sql-machine-learning/sqlflow/blob/56868d3cfe274519be8b045e76966f943b031dd0/pkg/attribute/model_parameters.go#L14-L20