exalate-issue-sync[bot] commented 1 year ago

Put it in Tomas original ADMM framework.

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: I have implemented native multinomial for GLM. However, I encountered the following issues:

Many thresholds for linesearch, ADMM are optimized for one versus many implementation. I played with it a bit to make sure the code run but did not have time to optimize it for native multinomial. As a result, the algorithm quickly stops after a few iterations instead of running for longer period of time. This cause the performance of the native implementation to suffer.

The gram matrix has increased its size significantly. For example, if the number of active cols are 100. For one-vs-many implementation, the gram matrix size is 100 by 100. However, imagine if we have 100 (Steve: please advice on the number of classes you are looking at) classes here, the gram matrix for native implementation will be 10000x10000. If there are enum columns, I have an idea on how to reduce the memory requirement. This will be a little involved though.

I am currently running some performance profile in terms of time and prediction accuracies with datasets of various classes and number of columns. I will put out a report as soon as I get the results.

Based on the current observation, I do not think we will put this into our release.

Thanks, Wendy

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Summary of Run Results for Multinomial GLM

Setup

In this study, we ran multiple multinomial GLM tests off Hadoop with various number of classes and various number of predictor columns in each dataset. The goal here is to investigate the performance in terms of prediction error, logloss, number of iterations run and run time between the original implementation and with the new native implementation.

I have generated multinomial datasets with:

Number of classes are 3,5,10,20;

For a fixed number of classes, I generated 10, 50 and 100 predictors. Half the number of predictors are categoricals with 5 levels. This increases the problem complexities as the number of coefficients per class increases as 35 (10 columns), 150 (50 columns) and 300 (100 columns).
There are about 10000000 rows of data per dataset.

The original implementation will update the coefficients of one multinomial class at a time. Hence, it is training the classifier using 1 vs many.

Observations

The native implementation on the other hand will update the coefficients of all the multinomial classes at one shot. In theory this implementation should offer superior performance as you are adjusting the hyperplanes for all the class together in order to minimize your objective function. However, in reality, we encounter the following problems:

If there are n coefficients per class and there are m classes, the Gram matrix for this implementation will be of size nm by nm while the original implementation only needs to deal with a Gram matrix of size n by n. As a result, each iteration of the native implementation is going to take much longer and needs more memories.

Even though the native method takes very few iterations to converge, however, it is taking two few iterations. As a result, the performance in terms of prediction error and logloss are higher than the original implementation. There are several thresholds that need to be set for linear search and ADMM that are optimized for the original implementation. I probably need to re-set these for the native implementation in order to have the algorithm converge to the right value.

Despite the problems, I do notice the following advantages of the native implementation:

It converges much faster than the original implementation in terms of number of iterations (see results).

For a fixed multinomial class, as the complexity of the problem increases (number of predictor column increases), the performance degradation is smaller than the original implementation. Refer to the spread of prediction error and logloss in the graphs

Possible next steps

After this study, I have gained more respect for the original implementation. It achieves higher performances while using less memory. The native implementation has the potential to achieve better results. However, the following issues must be resolved:

Adjust the various thresholds to increase the number of iterations;

Make it more memory efficient. I will investigate Michalk’s suggestion of using a hybrid method of adjusting the coefficients of a subset of classes. ([https://0xdata.atlassian.net/projects/PUBDEV/issues/PUBDEV-7168?filter=myopenissues&orderby=priority DESC|https://0xdata.atlassian.net/projects/PUBDEV/issues/PUBDEV-7168?filter=myopenissues&orderby=priority%20DESC])

However, since the performance of current implementation is pretty good at this point, the above enhancement will be of lower priority.

Results:

!image-20200111-000042.jpeg|width=1296,height=312!

!image-20200111-000103.jpeg|width=1296,height=762!

!image-20200111-000114.jpeg|width=1296,height=760!

!image-20200111-000124.jpeg|width=1296,height=750!

!image-20200111-000135.jpeg|width=1296,height=750!

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: I have implemented multinomial working on all coefficients for all classes in parallel. However, I do find small gain in time but not in performance. Hence, I am not going to push my fix into our master code for now.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6323 Assignee: Wendy Reporter: Wendy State: Closed Fix Version: 3.28.1.x Attachments: Available (Count: 5) Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/3882 https://github.com/h2oai/h2o-3/pull/3414

Attachments From Jira

Attachment Name: image-20200111-000042.jpeg Attached By: Wendy File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6323/image-20200111-000042.jpeg

Attachment Name: image-20200111-000103.jpeg Attached By: Wendy File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6323/image-20200111-000103.jpeg

Attachment Name: image-20200111-000114.jpeg Attached By: Wendy File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6323/image-20200111-000114.jpeg

Attachment Name: image-20200111-000124.jpeg Attached By: Wendy File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6323/image-20200111-000124.jpeg

Attachment Name: image-20200111-000135.jpeg Attached By: Wendy File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6323/image-20200111-000135.jpeg

h2oai / h2o-3

GLM: Implement multinomial natively without used binomial code #9299

Number of classes are 3,5,10,20;

It converges much faster than the original implementation in terms of number of iterations (see results).

Adjust the various thresholds to increase the number of iterations;