sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.07k stars 699 forks source link

Contribute a convenient function of applying statistical learning methods on SQLFlow #2460

Open Echo9573 opened 4 years ago

Echo9573 commented 4 years ago

简介

SQLFlow 旨在建立数据库和 AI 引擎之间的桥梁,利用 SQL 简洁的语法和强大的描述能力降低使用 AI 的门槛。你可以从这里看到使用 SQLFlow 工具实现机器学习任务是多么的简单。数据分析师或者数据科学家们则是SQLFlow的主要用户之一。但是对数据分析者们来说,往往除了建立机器学习模型之外,统计分析是他们日常工作的重要部分之一。我们需要更多贡献者来帮助我们完善SQLFlow,使其能成为功能更全的工具!

Description

SQLFlow aims to act as the bridge of DBs and AI engines. It leverages the powerful representation ability of SQL to ease the usage of AI while still keeping its simplicity. From here you can find out how easy it is to do machine learning task with SQLFlow. Data analysts or data scientists are one of the main users of SQLFlow. But for data analysts, in addition to building machine learning models, statistical analysis is an important part of their daily work. We need more contributors to help us improve SQLFlow so that it can become a more fully functional tool!

目标

Python库中的statsmodel是分析师群体常用的统计分析工具之一。该工具用于拟合多种统计模型,执行统计测试以及数据探索和可视化。因此,现在我们希望贡献者能够开发出一个功能,帮助用户能够直接在获取数据后,基于SQLFlow完成直接使用statsmodel进行日常的统计分析工作的功能。我们希望SQLFlow具备的统计分析功能越多越好。我们希望该工具具有以下特性: 1、容易使用:该功能调用便捷,该功能应该提供良好的用户体验,如果出现错误应该给出清晰的提示和解决方案。 2、功能完整:该功能能够覆盖statsmodel中具备的大部分统计分析方法。至少实现时间序列模型、线性模型, 包括 ANOVA 等。另一方面,希望该工具能包含各种各种统计测试和工具。因为这些功能是分析师们最常用的统计分析方法。 3、可扩展的:对于statsmodel中的其他函数,可以考虑设置扩展方式以方便用户快速导入和使用。

Target

The statsmodel in the Python library is one of the statistical analysis tools commonly used by the analyst community. The tool is used to fit a variety of statistical models, perform statistical tests, and data exploration and visualization. Therefore, we now hope that contributors can develop a function to help users directly complete the daily statistical analysis work using statsmodel based on SQLFlow after acquiring data directly. We hope that SQLFlow has as many statistical analysis functions as possible. We hope that the tool has the following characteristics:

  1. User-Friendly: This function is convenient to call. It provides good user experience, even when it goes wrong, it should notify the user why and what to do next.
  2. Full-featured: This function can cover most of the statistical analysis methods in statsmodel. At least the time series model, linear model, including ANOVA, etc. On the other hand, it is hoped that the tool can include various statistical tests and tools. Because these functions are the most commonly used statistical analysis methods by analysts.
  3. Extensible: For other function functions in statsmodel, you can consider setting up an extended way to facilitate users to quickly import and use.

联系我们

如果您有很棒的主意,请用下面的邮件地址联系我们。我们将会非常感谢你的参与和贡献!

邮件地址: sqlflow@list.alibaba-inc.com

Contact us!

If you have any great idea, please contact us with following email address. We will appreciate your participation and contribution!

Email Address: sqlflow@list.alibaba-inc.com

rpjayasekara commented 4 years ago

I would like to work on this issue during the ASoC 2020.

rpjayasekara commented 4 years ago

@Echo9573 Statsmodels library includes a variety of models and tools such as Regression models, Linear models, time series analysis etc. Should all those functions need to be implemented or are there any preferences for implementing selected functionalities?

Echo9573 commented 4 years ago

@rpjayasekara Thank you for your interest in this project. We hope SQLFlow has as many statistical analysis functions as possible. Therefore, it is hoped that this function covers most of the statistical analysis methods in statsmodel. At least time series models, linear models (including ANOVA, etc.). On the other hand, it is hoped that the tool can include various statistical tests and tools. Because these functions are the most commonly used statistical analysis methods by analysts. For other function functions in statsmodel, you can consider setting up an extended way to facilitate users to quickly import and use.

rpjayasekara commented 4 years ago

I think if we can figure out a robust way of integrating Statsmodels library with the SQLFlow, then it will be easy to import any Statsmodel models into the SQLFlow.

Echo9573 commented 4 years ago

@rpjayasekara Yes, you are right!