maeri-project / mRNA

mRNA
https://synergy.ece.gatech.edu/tools/mrna
MIT License
22 stars 9 forks source link

Possible bug #1

Open francisco-munoz opened 5 years ago

francisco-munoz commented 5 years ago

Hello,

First of all, congratulations for this work. It is very useful for the research community and it has potential for exploring.

I think I have detected a small bug, but significant in the code. I don't know if I am right or I have misunderstood something. Please let me know as soon as possible.

The bug is located in the file DSNetwork.cpp. In the implementation of the function void DSwitch::ConfigDS(int input_src, int output_mode), in line 17 is this:

config_rds = input_rds;

Would not be this?

config_input = input_rds;

Because as I see it, you are configuring who is the father of the tree (the input of the current node) and therefore, depending on the input_src variable, you connect config_input either to input_lds or input_rds.

Is this correct?

Thank you very much , Francisco email: francisco.munoz2@um.es

zzy82518996 commented 5 years ago

Hi, thanks so much for your interesting in mRNA. For the first question yes, you are correct, it should be config_input = input_rds. For the second question, it is a good question but I thinks you have a little bit miss understood on it. Our principle is to find out a best trade-off point for the efficiency. We do not prune all the strategies that generate partial sums, what we prune is the strategy which generate too much partial sums that cause the third reconfiguration of the network. (Usually we hope using two network configurations is able to perform the computation of the whole layer). In the example layer you provide, you can see, even though there is partial sums in mapping strategy 1, its run time is the fastest among all the strategies, this is because it generates the highest utilization rate.

Actually, we are going to update mRNA soon, in the new version, the searching space will be enlarged. And more realistic cases will be considered, for example, the overhead of reconfiguring the networks, bandwidth of the main-memory an so on.

Really appreciated for your advise and question. Please let me know for the further problem inside the code base. mRNA is still under construction. Thanks.

Yours, sincerely. Best regards.

Zhongyuan.

----- 原始邮件 ----- 发件人: "Francisco Muñoz Martinez" notifications@github.com 收件人: "georgia-tech-synergy-lab/mRNA" mRNA@noreply.github.com 抄送: "Subscribed" subscribed@noreply.github.com 发送时间: 星期四, 2019年 6 月 06日 下午 5:44:14 主题: [georgia-tech-synergy-lab/mRNA] Possible bug (#1)

Hello,

First of all, congratulations for this work. It is very useful for the research community and it has potential for exploring.

I think I have detected a small bug, but significant in the code. I don't know if I am right or I have misunderstood something. Please let me know as soon as possible.

The bug is located in the file DSNetwork.cpp. In the implementation of the function void DSwitch::ConfigDS(int input_src, int output_mode), in line 17 is this:

config_rds = input_rds;

Would not be this?

config_input = input_rds;

Because as I see it, you are configuring who is the father of the tree (the input of the current node) and therefore, depending on the input_src variable, you connect config_input either to input_lds or input_rds.

Is this correct?

Thank you very much

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/georgia-tech-synergy-lab/mRNA/issues/1

francisco-munoz commented 5 years ago

Hi @zzy82518996 thank you so much for your quick answer. I really appreciate it. Regarding with your answer: yes, it is possible a missunderstood a little bit. Now, everything is clear.

There is something else I would like you ask you in addition.

-How do you deal with the loops outside the tile in MAERI? Because I see that mRNA generate this code with that purpose:

Loops outside the tile: R/T_X = 1 S/T_Y = 1 C/T_C = 15 C/T_K = 1 C/T_N = 1 C/T_OX = 56 C/T_OY = 56 R%T_X = 0 S%T_Y = 0 C%T_C = 0 K%T_K = 0 N%T_N = 0 X'%T_OX = 0 Y'%T_OY = 0

However, what you would need is another tile (T_R,T_S,....) to deal with this. Am I wrong? If it is like that, why mRNA does not search the best mapping for this remaining loop?

MAERI Configuration: Layer variables: X = 224 Y = 224 C = 60 K = 1 N = 1 X' = 56 Y' = 56 R = 11 S = 11 Mapping kernel (tile): T_X = 11 T_Y = 11 T_C = 4 T_K = 1 T_N = 1 T_X' = 1 T_Y' = 1 Virtual Neuron : VN_Size = 484 Num_VN = 1

wouldn't you need another code like this for the second configuration?

Thank you so much and I hope have explained myself well. Best regards, Francisco Muñoz

zzy82518996 commented 5 years ago

Dear Francisco Muñoz:

Thanks for your questions.

For your first question I think you are right, we do not perform mapping space exploration for the outer loops in this version. In current version, we choose an loop order which we think is able to generate the best on-chip memory locality. But we are developing the mapping exploration for the outer loops. Essentially, the order of the outer loops influences the data transform pattern between the off-chip and on-chip memory. Furthermore, we also consider deploying DNN onto multiple distributed MAERI accelerators. We think this problem is the combination of mapping exploration and data placement and the search space will get much larger. Will let you know our new progress and welcome for your new constructive questions and suggestions.

For the second question, you are also right, we indeed need to generate two configuration files. Actually, we only generate the config file for convolution process but not for partial sums adding process. Our config file is a intermediate representation, the actual config code is generated in MAERI tool chain which is implement outside mRNA. Because we think partial sums adding do not have the concept of T_R, T_s and so on, because it is just the parallel adding process. Right now, the complete tool chain from Tensorflow to machine code is not totally automatic. The interface between mRNA and back-end code generator needs some handwriting work.

So, if you need the tool chain on the hardware side, you can send to email to Hyoukjun "hyoukjun@gatech.edu" to ask for the code. We have made the tutorial in HPCA 2019 and ISCA 2018. You can find them on the conference website or professor Tushar Krishna's website.

Thanks again for your constructive questions. And welcome to communicate with us for the further discussion on the mapping space exploration for DNNs.

Yours, sincerely. Best regards.

Zhongyuan.

----- 原始邮件 ----- 发件人: "Francisco Muñoz Martinez" notifications@github.com 收件人: "georgia-tech-synergy-lab/mRNA" mRNA@noreply.github.com 抄送: "zzy82518996" zzy82158996@sjtu.edu.cn, "Mention" mention@noreply.github.com 发送时间: 星期五, 2019年 6 月 07日 下午 10:07:01 主题: Re: [georgia-tech-synergy-lab/mRNA] Possible bug (#1)

Hi @zzy82518996 thank you so much for your quick answer. I really appreciate it. Regarding with your answer: yes, it is possible a missunderstood a little bit. Now, everything is clear.

There is something else I would like you ask you in addition.

-How do you deal with the loops outside the tile in MAERI? Because I see that mRNA generate this code with that purpose:

Loops outside the tile:

R/T_X = 1

S/T_Y = 1

C/T_C = 15

C/T_K = 1

C/T_N = 1

C/T_OX = 56

C/T_OY = 56

R%T_X = 0

S%T_Y = 0

C%T_C = 0

K%T_K = 0

N%T_N = 0

X'%T_OX = 0

Y'%T_OY = 0

However, what you would need is another tile (T_R,T_S,....) to deal with this. Am I wrong? If it is like that, why mRNA does not search the best mapping for this remaining loop?

MAERI Configuration:

Layer variables:

X = 224

Y = 224

C = 60

K = 1

N = 1

X' = 56

Y' = 56

R = 11

S = 11

Mapping kernel (tile):

T_X = 11

T_Y = 11

T_C = 4

T_K = 1

T_N = 1

T_X' = 1

T_Y' = 1

Virtual Neuron :

VN_Size = 484

Num_VN = 1

wouldn't you need another code like this for the second configuration?

Thank you so much and I hope have explained myself well.

Best regards,

Francisco Muñoz

--

You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub:

https://github.com/georgia-tech-synergy-lab/mRNA/issues/1#issuecomment-499898526

francisco-munoz commented 5 years ago

Hello again @zzy82518996 ,

Thank you so much for your answers, and also thank you for being so open to questions and suggestions.

The idea of using distributed MAERI accelerators looks quite interesting and in my opinion, I don't think the space search will increase too much. Most of the layers in a DNN has a large number of output channels (i.e., k). I think you could leverage this to scatter the k-dimension over the different accelerators and broadcast the inputs. I believe that would be one of the best mappings. You might think also that when distributing the load into the accelerators, you are going to use just coarse-grained division and therefore you can either distribute the k dimension as I explained or the n dimension. It would depend on the shape of the layer. Don't you think so?

With the tool chain in the hardware side, do you mean the MAERI configuration code generator? or the MAERI code itself?

If it's the MAERI itself, I have downloaded the MAERI code, but I am having many problems to get the bluespec license. Do you know how can I get it? I have email them several times but I have no answer at all.

Yours, Francisco Muñoz Martinez