Open sneaxiy opened 4 years ago
@sneaxiy This issue mixed up two topics -- (1) removing global variables and (2) redesign the API. Let us keep (2) at https://github.com/sql-machine-learning/playground/issues/42 and make this issue focusing on (1).
For (1), I don't see a conclusion -- can we move some of the global variables or not? and why?
I guess the lack of a conclusion is (partly) due to an incomplete overview of the source code. For example, ModelParameterJSON
contains attribute names and explanations. attribute.Dictionary
also contains such information. Why do we need redundant form of the same piece of information in our code? And, what is the relationship between extract_docstring.py
, the Go global variables, and cmd/docgen.go
?
@wangkuiyi The conclusion is that we should:
PremadeModelParamsDocs
, OptimizerParameterDocs
and XGBoostObjectiveDocs
. ModelParameterJSON
, OptimizerParameterJSON
and XGBoostObjectiveJSON
. *JSON
objects, we can generate PremadeModelParamsDocs
and OptimizerParameterDocs
automatically instead of hard coding in *.go
files.The reasons are as follows:
Why should we keep PremadeModelParamsDocs
, OptimizerParameterDocs
and XGBoostObjectiveDocs
?
These attribute.Dictionary
objects are not only used in attribute
package but also cli prompt suggestion (see 1, 2, 3). So we cannot remove these attribute.Dictionary
objects.
Why should we remove ModelParameterJSON
, OptimizerParameterJSON
and XGBoostObjectiveJSON
?
As you have mentioned, these JSON strings would be only used to be deserialized to be attribute.Dictionary
(PremadeModelParamsDocs
, OptimizerParameterDocs
and XGBoostObjectiveDocs
) inside attribute package
. xxxDocs
>= xxxJSON
, because xxxJSON
only contains docs of TensorFlow and XGBoost models, but xxxDocs
may contain models from TensorFlow, XGBoost and sqlflow_models
. We do not need redundant information.
[Optional] Why should we generate PremadeModelParamsDocs
and OptimizerParameterDocs
automatically?
As discussed above, PremadeModelParamsDocs
and OptimizerParameterDocs
are deserialized from ModelParameterJSON
and OptimizerParameterJSON
. ModelParameterJSON
and OptimizerParameterJSON
are the doc strings of TensorFlow and XGBoost model APIs, which can be extracted by extract_docstring.py
. extract_docstring.py
would scan the __doc__
of constructor (or callable model method) of each models from Python source code automatically, and print them into a file (we have printed them into model_parameters.go
beforehand). Please see the snapshot of model_parameters.go
. That is to say, the hard coding ModelParameterJSON
and OptimizerParameterJSON
(or say, the hard coding PremadeModelParamsDocs
and OptimizerParameterDocs
if we have removed ModelParameterJSON
and OptimizerParameterJSON
) are also redundant codes because they can be generated automatically by extract_docstring.py
.
This issue further addresses: https://github.com/sql-machine-learning/playground/issues/42#issuecomment-645666536
Where are the global variables
Exported global map which stores the parameter docs of models, including https://github.com/sql-machine-learning/sqlflow/blob/befcd74558f47a4416e3b7d6fd3130645cf5b813/pkg/attribute/attribute.go#L219-L226 https://github.com/sql-machine-learning/sqlflow/blob/befcd74558f47a4416e3b7d6fd3130645cf5b813/pkg/attribute/model_parameters.go#L18-L20 https://github.com/sql-machine-learning/sqlflow/blob/befcd74558f47a4416e3b7d6fd3130645cf5b813/pkg/attribute/model_parameters.go#L227-L229 https://github.com/sql-machine-learning/sqlflow/blob/befcd74558f47a4416e3b7d6fd3130645cf5b813/pkg/attribute/xgboost_objective_params.go#L18-L20
Exported parameter type definitions, including https://github.com/sql-machine-learning/sqlflow/blob/befcd74558f47a4416e3b7d6fd3130645cf5b813/pkg/attribute/attribute.go#L32-L45
How to remove the exported global map which stores the parameter docs of models
Some of these global variables share the same information. For example:
PremadeModelParamsDocs
is deserialized fromModelParameterJSON
. So we can removeModelParameterJSON
. Furthermore,ModelParameterJSON
is a variable which can be auto generated bypython extract_docstring.py > model_parameters.go
command. We can do this auto generation usinggo generate
to generatePremadeModelParamsDocs
automatically too.OptimizerParameterDocs
is deserialized fromOptimizerParameterJSON
. So we can removeOptimizerParameterJSON
. Furthermore,OptimizerParameterJSON
can also be auto generated bypython extract_docstring.py > model_parameters.go
usinggo generate
.XGBoostObjectiveDocs
is deserialized fromXGBoostObjectiveJSON
. So we can removeXGBoostObjectiveJSON
and keepXGBoostObjectiveDocs
.Note that
PremadeModelParamsDocs
,OptimizerParameterDocs
andXGBoostObjectiveDocs
are also used in cli prompt suggestion (see 1, 2, 3). So we cannot remove these 3 variables or hide them.How to remove exported parameter type definitions
https://github.com/sql-machine-learning/playground/issues/42#issue-640787739 suggests to enhance compile time data checking using the following ways.
In this way,
attribute.Description
,attribute.Int
,attribute.Float
, etc can be hidden. The signature ofDictionary.Int
would be:The only concern of this method is that we cannot support
nil
default value. Some of the models may have attributes withnil
default values. For example, the default value ofnum_class
in XGBoost model isnil
(see here), because only multi-class (>2) classification models neednum_class
while the other models do not neednum_class
. And oncenum_class
is provided in SQLWITH
statement, it must be an integer number, so the data type ofnum_class
attribute should beint
. The meaning ofnil
default value in SQLFlow is:WITH
statement, SQLFlow would check whether its type is right and callDescription.Checker()
to check whether it is valid. For example, ifnum_class
is provided in SQLWITH
statement, SQLFlow would first check whether the value ofnum_class
is an integer, and callDescription.Checker()
to check whether it is a positive number.WITH
statement, nothing would be checked.We can enhance this method to support
nil
default value.Default
method to set default value. The code is something like: https://github.com/sql-machine-learning/playground/issues/42#issuecomment-645853937 .Since the
Default
method can be used both afterInt(...)
andFloat(...)
, the input parameter type ofDefault
must beinterface{}
. Therefore, the signature ofDefault
method should beIn this way, we cannot check whether the default value is of the right type in compile time.
Dictionary.Int
andDictionary.IntOrNil
method. The signature ofDictionary.IntOrNil
method is like:In this way, we would double the APIs to
Dictionary
.Dictionary.Int
method accepts both int default value and nil default value. The signature ofDictionary.Int
would bevar dict = Dictionary{}. Int("num_class", optional.Int{}, "doc1", checker1). // default value is nil Int("attr_with_default_value", optional.NewInt(0), "doc2", checker2) // default value is 0