Closed MeowMJ closed 9 months ago
Which fiction fuse mul with add into new muladd?
Function combineMulAdd(). But it's not used in this mapper in fact.
Function combineMulAdd(). But it's not used in this mapper in fact.
Can you please apply that combineMulAdd() on a fir kernel and show the two DFGs (before and after) here?
Function combineMulAdd(). But it's not used in this mapper in fact.
Can you please apply that combineMulAdd() on a fir kernel and show the two DFGs (before and after) here?
Fir kernel doesn't contain a mul and add that can be combined, So I show inttCore kernel (apply combine("mul","sub") before and after) instead.
I notice that combineMulAdd() is not as useful as combine("mul","add") or combine("mul","sub") or others, because it only combines Critical dfgNode.
Note that (19)mul and (21)sub are combined in InttAfterFuse.png and become (19)mulsub.
InttBeforeFuse.png InttAfterFuse.png
fmul
followed by fadd
. So that pair can also be combined, right?MAC
(or FMAC) in the GUI, all (or some of) the mul
and add
(or fmul
and fadd
) need to be combined.fadd
has a circule with phi
, making fmul
and fadd
can't be combined. I will check the combine
function later to answer your question, Sir. Right now, I have a busy time struggling with my relatives QAQ.
- Why 19 and 21 are combined rather than 20 and 21? Which pair has the higher priority to be combined in your opinion?
- FIR kernel has a
fmul
followed byfadd
. So that pair can also be combined, right?- The main purpose of this Pull Request (PR) is that when users enable
MAC
(or FMAC) in the GUI, all (or some of) themul
andadd
(orfmul
andfadd
) need to be combined.
fmul
can be combined with fadd
now. Note that the changed code is not in this PR.Below is Fir before and after combineMulAdd()(the one I remove judgment of critical nodes). Note that (6)mul
and (7)add
are combined and become (6)fmuladd
.
firBeforeFuse.png firAfterFuse.png
(13)zext
to (25)store
is reduced. So 20 should have higher priority to be combined? But the total number of nodes is the same whether you combine 19 or 20.Nice.
phi
is not able to be fused with add
to decrease the RecMII to 1, because the add
is fused with mul
first. For the 13 to 25 case, the ones that can decrease the overall height should have higher priority (agree? This may help the mapping. But I think you don't need to implement this case for now). Please focus on increasing the priority of the fusion on critical path first.So the purpose of combineForIter() is to combine any kind of critical path, no matter it's add/compare/br/phi or add/phi (those two are the most common ones but there exit other critical paths)? In that way, I need to change the logic of combineForIter().
I also want to know if this task is relevant to the mail you sent to us (Help needed ...) before or not.
Yes, we want to fuse all the nodes within the critical cycle. However, we also want to make it parameterizable, for example, some cases, we may not want to over fuse them. So we can start from the above two cases I guess?
Yes, this is related to the mail. Fusion is a critical step towards better mapping.
OK, I will try to leave a parameter interface in combineForIter() and try to provide some tests for my modifications.
I made it in a recursive way, sir. But I want to confirm some questions.
Cool, great progress. Recursive way is very popular for pattern recognition. I am fine with either recursive or iterative. It would be perfect to also fuse the pattern in the non-critical paths. But if you think that is not trivial engineering effort, we can leave it to another PR.
DFG::combineIter() and DFG::combineForIter() is made to fuse the pattern in DFGs regardless the order or size of the pattern. I update the codes and you can merge the PR if there is no question. Here is an example of DFG of fir after fusion where the pattern is "add -cmp-br-phi" and "phi-add".
combineForIter() is to accept fusion pattern that user provides and it will call combineIter(). combineIter() is a recursive function that fuse patterns.
Here is an example of code of fusion pattern "add -cmp-br-phi" and "phi-add".
list<string>* combinePattern1 = new list<string>[4];
combinePattern1 -> push_back("add");
combinePattern1 -> push_back("icmp");
combinePattern1 -> push_back("br");
combinePattern1 -> push_back("phi");
list<DFGNode*>* cPatternNodes1 = new list<DFGNode*>[4];
combineForIter(combinePattern1, cPatternNodes1);
list<string>* combinePattern2 = new list<string>[2];
combinePattern2 -> push_back("phi");
combinePattern2 -> push_back("fadd");
list<DFGNode*>* cPatternNodes2 = new list<DFGNode*>[2];
combineForIter(combinePattern2, cPatternNodes2);
I update combineForIter() and combineIter() into one function and this is an example code of how to use this function. The result DFG is the same as before.
string* combinePattern1 = new string[4];
combinePattern1[0] = "add";
combinePattern1[1] = "icmp";
combinePattern1[2] = "br";
combinePattern1[3] = "phi";
list<DFGNode*>* cPatternNodes1 = new list<DFGNode*>[4];
combineForIter(combinePattern1, 4, cPatternNodes1);
string* combinePattern2 = new string[2];
combinePattern2[0] = ("phi");
combinePattern2[1] = ("fadd");
list<DFGNode*>* cPatternNodes2 = new list<DFGNode*>[2];
combineForIter(combinePattern2, 2, cPatternNodes2);
I update combineForIter() with one parameter and fix the name issue. The result DFG is the same as before.
list<string>* combinePattern1 = new list<string>[4];
combinePattern1 -> push_back("add");
combinePattern1 -> push_back("icmp");
combinePattern1 -> push_back("br");
combinePattern1 -> push_back("phi");
combineForIter(combinePattern1);
list<string>* combinePattern2 = new list<string>[2];
combinePattern2 -> push_back("phi");
combinePattern2 -> push_back("fadd");
combineForIter(combinePattern2);
After combineMulAdd(), the opcodeName changes. So isMAC() detects new opcodeName to verify whether it is MAC and isMAC() won't always return true.