Open yc2367 opened 2 years ago
You're right. I think in both cases the area that was calculated should be returned (it is put in a local variable area, but not returned). If you make a pull request with that change and test it gives reasonable results we can merge it in.
Thanks for flagging this.
Hi,
After carefully looking at the whole tran_sizing.py code. I found additional issues regarding the cost function for area-delay trade-offs:
This won't be a problem for the "global" mode. If you look at the _get_evalarea() function invoked within the _costfunction() and follow the _get_evalarea() definition: https://github.com/vaughnbetz/COFFE/blob/09fb0d420d16b14c9a95be266c894ad70f7fcb7b/coffe/tran_sizing.py#L237
When the _opttype parameter is hard-coded as "global", and the _is_ramcomponent and _is_cccomponent parameters are 0 as in the case for most logic tile components, the _get_evalarea() function will return _fpga_inst.areadict["tile"] which is okay. But I think a better practice is that in the sizing code for each component as shown in the above three examples, the _fpga_inst.sbmux parameter should be replaced by the name of each component being sized. For example, in this example: https://github.com/vaughnbetz/COFFE/blob/09fb0d420d16b14c9a95be266c894ad70f7fcb7b/coffe/tran_sizing.py#L2363 The code is trying to size the connection block mux, then it's better to put _fpga_inst.cbmux instead of _fpga_inst.sbmux. This allows the user to run COFFE in local mode without using the wrong area (which is always the area of _fpga_inst.sbmux in the current COFFE).
If you look at the _get_currentdelay() function: https://github.com/vaughnbetz/COFFE/blob/09fb0d420d16b14c9a95be266c894ad70f7fcb7b/coffe/tran_sizing.py#L406 The second parameter _is_ramcomponent specifies whether this component is a RAM component, this parameter should be 0 when sizing the above three examples, but COFFE assigns the cost_function twice with the second assignment using _is_ramcomponent = 1. According to the _get_currentdelay() function, if _is_ramcomponent = 1, the function will return the delay of RAM which gives a wrong cost. https://github.com/vaughnbetz/COFFE/blob/09fb0d420d16b14c9a95be266c894ad70f7fcb7b/coffe/tran_sizing.py#L483-L486
After modifying the code with my understanding based on the above two points, I reran the test_top_level.py file to test whether I did something wrong. The tests passed for most components since I am running in the global mode. But for some carry chains and lut drivers, the tests failed with 5% ~ 15% error in delay. But I think this is because that the current COFFE uses a wrong current cost for these components as described in my second point (assigning current_cost twice).
If you want to look at my modifications, you can refer to my branch here (only changed the tran_sizing.py file): https://github.com/yc2367/COFFE/tree/Yuzong_Test
Please let me know if you prefer me to create a pull request for comparison. Thanks!
Thanks for the detailed analysis! @sadegh68 @aman26kbm : any thoughts? These look like good things to fix, but getting your opinion on the code in question and what tests should be run would be great.
Good observations, @yc2367. Thanks for the details.
I agree that these are all bugs we should fix. Most likely they are copy-paste errors, and because we usually don't run the local mode.
I went through the changes in your branch. I agree with all changes. There are a couple things I want to mention:
I do see you've added a new test file as well, which runs the existing 4 tests with the local
switch. That's good. Please also add the reference results files for these tests. You can generate them using the generate_reference
method in tests.py.
I agree with the changes. These bugs exist due to the fact that the local mode was not tested during parts of COFFE's development. A few tests focusing on the local mode should reduce the likelihood of such future bugs in the future.
Hi @vaughnbetz, @aman26kbm and @sadegh68 ,
Thanks a lot for your reply.
Another question, I actually reran the original COFFE with tests_top_level.py. It also gave some delay errors for components like lut_driver, carry_chain etc. Could you help me clarify this point?
I will try to add a reference file for the local mode soon. I think the benefit of local mode is that if some users want to develop additional components like computing in-memory blocks, e.g. an adder. They may want to reduce the area overhead by running the local mode just for that computing in-memory block, and this will be much faster since the rest of the sizing code can even be commented out when running the local mode for just one block.
Aren't these failures expected? We are changing the code that affects the sizing of these components (lut driver and carry chain), even in the global mode. No?
Thanks for the reply! I mean the original COFFE without my changes, which is the current COFFFE on this repository, also failed some tests using the current test reference file.
Ah I see. Could be because of the HSPICE version? The original reference files may have been generated with an older version of HSPICE.
Might be, I am using the 2022 version Hspice. But just want to clarify if you will have get some 5%~15% errors using yiur Hspice?
I'm not sure of the reason. The changes are pretty small, thankfully. Adding @StephenMoreOSU in case he has any ideas.
Hi, May I ask if someone has helped me verify if they also had some small error when running the tests_top_level.py? Thanks a lot!
Sorry I won't have the time to help with the verification.
Hi,
I am currently using coffe to run automatic sizing for a full adder. I only care about the area and delay trade-off for this subcircuit without considering the global mode. Hence, I specify "-o local" when running coffee. However, it continuously popped up an error saying that the _area_list in transizing.py, line 1386 was empty. I looked into this issue and found that the reason was that the _get_eval_area() function in tran_sizing.py, line 237_ doesn't return anything when the opt_type is "local":
https://github.com/vaughnbetz/COFFE/blob/09fb0d420d16b14c9a95be266c894ad70f7fcb7b/coffe/tran_sizing.py#L237-L243
And the same problem exists for the _get_final_area() function in transizing.py, line 254:
https://github.com/vaughnbetz/COFFE/blob/09fb0d420d16b14c9a95be266c894ad70f7fcb7b/coffe/tran_sizing.py#L254-L257
Could you take a look at this?