Can I use this to create a powerful MLP architecture?

MingLin-home commented 1 year ago

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

MingLin-home commented 1 year ago

Hi Kenny,

I cannot find your reply on Github so I'm not sure whether you deleted it or not. In case you are still interested:

The network entropy only depends on the width and depth of the architecture, up to some constants. So you can use non-Gaussian distribution to initialize your weights but the results will be nearly the same, up to some constants.
Please note that you cannot use adaptive weight initialization, e.g., the weight will be normalized by fan-in / fan-out. These normalized methods will "normalize" your entropy no matter how many channels you have.

Best, Ming

On Mon, Apr 24, 2023 at 10:06 PM Kenny Wu @.***> wrote:

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

[image: image] https://user-images.githubusercontent.com/110704880/234179263-4985f32e-f3e6-4511-bf67-6c6f8781a3bb.png

I went over how you derived the entropy of the MLP in DeepMAD. I found that the final formula needed to be based on the standard normal distribution assumption in box 1, which also resulted in the entropy of the MLP being dependent only on the model structure of the MLP itself (the width of each layer and the depth of the network), not on the weight and input of the network.

So when I use this formula based on the standard normal distribution hypothesis to design the MLP network, can I really not care about the weight of the network and the input distribution？

— Reply to this email directly, view it on GitHub https://github.com/idstcv/ZenNAS/issues/32#issuecomment-1521158856, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIQVWNLN6YFZJ6FKDZPDDDXC5LUVANCNFSM6AAAAAAXINWQCM . You are receiving this because you commented.Message ID: @.***>

kennyorn1 commented 1 year ago

Hi Kenny, I cannot find your reply on Github so I'm not sure whether you deleted it or not. In case you are still interested: The network entropy only depends on the width and depth of the architecture, up to some constants. So you can use non-Gaussian distribution to initialize your weights but the results will be nearly the same, up to some constants. Please note that you cannot use adaptive weight initialization, e.g., the weight will be normalized by fan-in / fan-out. These normalized methods will "normalize" your entropy no matter how many channels you have. Best, Ming … On Mon, Apr 24, 2023 at 10:06 PM Kenny Wu @.> wrote: We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search [image: image] https://user-images.githubusercontent.com/110704880/234179263-4985f32e-f3e6-4511-bf67-6c6f8781a3bb.png I went over how you derived the entropy of the MLP in DeepMAD. I found that the final formula needed to be based on the standard normal distribution assumption in box 1, which also resulted in the entropy of the MLP being dependent only on the model structure of the MLP itself (the width of each layer and the depth of the network), not on the weight and input of the network. So when I use this formula based on the standard normal distribution hypothesis to design the MLP network, can I really not care about the weight of the network and the input distribution？ — Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIQVWNLN6YFZJ6FKDZPDDDXC5LUVANCNFSM6AAAAAAXINWQCM . You are receiving this because you commented.Message ID: @.>

Thanks for replying!

Some questions I specifically asked under the DeepMAD repository and got appropriate answers, so I closed them.

By the way, I have another question which confuses me when I try to put the theory of DeepMAD into practice.

How do I design a MLP based on a specific dataset using the theory of DeepMAD？

For example, the number of samples in my dataset is sometimes large and sometimes small, which will affect the design of the network structure, but it seems that I don't see a discussion or study of this in DeepMAD.

kennyorn1 commented 1 year ago

I think this should be 'L-1' if I don't misunderstand.

Detail in https://github.com/alibaba/lightweight-neural-architecture-search/issues/11#issuecomment-1522718285

MingLin-home commented 1 year ago

Thanks for the feedback! Please just use all L layers. I did not check whether we index from 0 to L-1 or from 1 to L.

On Tue, Apr 25, 2023 at 7:56 PM Kenny Wu @.***> wrote:

[image: image] https://user-images.githubusercontent.com/110704880/234454886-b1a13c16-af8a-4f04-a444-0a931dde0c09.png

I think this should be 'L-1' if I don't misunderstand.

— Reply to this email directly, view it on GitHub https://github.com/idstcv/ZenNAS/issues/32#issuecomment-1522696970, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIQVWILYAF5FH5MHGGCXY3XDCFGFANCNFSM6AAAAAAXINWQCM . You are receiving this because you commented.Message ID: @.***>

kennyorn1 commented 1 year ago

Thanks for replying!

What about the attributes of the training datasets.

Specifically, if the number of samples in a training datasets is 10000, and the number of samples in another training datasets is 100.

Basically, how to quantify the influence of training datasets size on MLP structure design.

Kenny @.***

------------------ 原始邮件 ------------------ 发件人: "idstcv/ZenNAS" @.>; 发送时间: 2023年4月26日(星期三) 中午1:09 @.>; @.**@.>; 主题: Re: [idstcv/ZenNAS] Can I use this to create a powerful MLP architecture? (Issue #32)

Thanks for the feedback! Please just use all L layers. I did not check whether we index from 0 to L-1 or from 1 to L.

On Tue, Apr 25, 2023 at 7:56 PM Kenny Wu @.***> wrote:

> [image: image] > <https://user-images.githubusercontent.com/110704880/234454886-b1a13c16-af8a-4f04-a444-0a931dde0c09.png> > > I think this should be 'L-1' if I don't misunderstand. > > — > Reply to this email directly, view it on GitHub > <https://github.com/idstcv/ZenNAS/issues/32#issuecomment-1522696970>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AFIQVWILYAF5FH5MHGGCXY3XDCFGFANCNFSM6AAAAAAXINWQCM> > . > You are receiving this because you commented.Message ID: > @.***> >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

MingLin-home commented 1 year ago

There is no easy way to introduce the number of training instances in DeepMAD. According to the machine learning theory, the safeway is to ensure that the number of network parameters should be much smaller than the number of training instances.

On Tue, Apr 25, 2023 at 10:15 PM Kenny Wu @.***> wrote:

Thanks for replying!

What about the attributes of the training datasets.

Specifically, if the number of samples in a training datasets is 10000, and the number of samples in another training datasets is 100.

Basically, how to quantify the influence of training datasets size on MLP structure design.

Kenny @.***

------------------ 原始邮件 ------------------ 发件人: "idstcv/ZenNAS" @.>; 发送时间: 2023年4月26日(星期三) 中午1:09 @.>; @.**@.>; 主题: Re: [idstcv/ZenNAS] Can I use this to create a powerful MLP architecture? (Issue #32)

Thanks for the feedback! Please just use all L layers. I did not check whether we index from 0 to L-1 or from 1 to L.

On Tue, Apr 25, 2023 at 7:56 PM Kenny Wu @.***> wrote:

> [image: image] > < https://user-images.githubusercontent.com/110704880/234454886-b1a13c16-af8a-4f04-a444-0a931dde0c09.png>

> > I think this should be 'L-1' if I don't misunderstand. > > — > Reply to this email directly, view it on GitHub > < https://github.com/idstcv/ZenNAS/issues/32#issuecomment-1522696970>, or > unsubscribe > < https://github.com/notifications/unsubscribe-auth/AFIQVWILYAF5FH5MHGGCXY3XDCFGFANCNFSM6AAAAAAXINWQCM>

> . > You are receiving this because you commented.Message ID: > @.***> >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/idstcv/ZenNAS/issues/32#issuecomment-1522797223, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIQVWLGOT2NLNOIEM3AG7LXDCVQPANCNFSM6AAAAAAXINWQCM . You are receiving this because you commented.Message ID: @.***>

kennyorn1 commented 1 year ago

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

I would like to ask if Zen-Score and DeepMAD are two completely different things? Are they hard to explain to each other?

kennyorn1 commented 1 year ago

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

I find that DeepMAD is based on information theory and Zen-Score is based on the theory of linear regions of deep neural networks, so I think DeepMAD and Zen-Score might be hard to reconcile now. They should be two completely different things.

kennyorn1 commented 1 year ago

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

On papers with code website, I find that DeepMAD beats Zen-Score on graphics tasks. Does this mean that designing an MLPs' NAS algorithm using DeepMAD based on information theory is better than using Zen-Score based on theory of linear regions of deep neural networks?

kennyorn1 commented 1 year ago

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

Then I also notice that Zen-Score seems to only propose NAS ideas for networks with a CNN structure, and has no ready-made ideas for NAS tasks for MLP networks. I want to make sure that's true, in case I misunderstand the paper.

If this is true, I would like to ask you whether you recommend using the method of information theory or using the theory related to linear regions to develop the NAS algorithm of MLP.

Sorry to bother you! But these questions are significant to me.

Hope for replying!

kennyorn1 commented 1 year ago

By the way, could you recommend some articles or work on the relationship between training sample complexity （such as the number of samples） and model complexity? Thanks a lot!!

Because I find that many articles are discussing the expressiveness of a model, but the training of the model cannot be separated from the sample. The most basic common sense is that if the number of samples is not enough, the expressiveness of the model, no matter how strong, cannot be trained successfully.

kennyorn1 commented 1 year ago

I have another question that I find that many theories are applied to classification tasks. Can these theories applied to classification tasks be applied to regression tasks which is also a very important task in industry?

MingLin-home commented 1 year ago

Sorry for the late reply!

It has nothing to do with the downstream task. Our DeepMAD method (or Zen-NAS as its early-version) finds "mathematically optimal" structure in our framework. So if you buy in our key argument that max-entropy and max-effectiveness are critical to build a good model, you can "believe" that this model can perform well in various downstream tasks. As you do not have more prior knowledge about the downstream task, that is pretty much the best we can hope.

Of course, if you have more prior knowledge, you can always design better models. Yet there is no significant difference between classification and regression to me.

kennyorn1 commented 1 year ago

Thanks a lot!

---Original--- From: "Ming @.> Date: Sat, May 6, 2023 23:09 PM To: @.>; Cc: "Kenny @.**@.>; Subject: Re: [idstcv/ZenNAS] Can I use this to create a powerful MLParchitecture? (Issue #32)

Sorry for the late reply!

It has nothing to do with the downstream task. Our DeepMAD method (or Zen-NAS as its early-version) finds "mathematically optimal" structure in our framework. So if you buy in our key argument that max-entropy and max-effectiveness are critical to build a good model, you can "believe" that this model can perform well in various downstream tasks. As you do not have more prior knowledge about the downstream task, that is pretty much the best we can hope.

Of course, if you have more prior knowledge, you can always design better models. Yet there is no significant difference between classification and regression to me.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

dovedx commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。邮件我已经成功接收，我会尽快处理并答复，谢谢！

kennyorn1 commented 1 year ago

Hi, I have some problems about a new NAS work. May I have your email to connect with you? I would appreciate it very much if you agree.@MingLin-home

MingLin-home commented 1 year ago

You are welcome! My email is on my homepage, linming04@gmail.com.

dovedx commented 2 weeks ago

这是来自QQ邮箱的假期自动回复邮件。邮件我已经成功接收，我会尽快处理并答复，谢谢！

idstcv / ZenNAS

Can I use this to create a powerful MLP architecture? #32