mit-emze / cimloop

MIT License
45 stars 12 forks source link

About Digital CiM framework evaluation #5

Open 0x7D6E7FD opened 3 months ago

0x7D6E7FD commented 3 months ago

Hello Tanner,

This is not exactly an issue. I have a few questions as I have recently started to explore the framework.

  1. Does this framework can evaluate all kind of Fully Digital CiM framework ?
  2. If yes then how can I make the YAML file to evaluate that ?

Thanks !

tanner-andrulis commented 3 months ago

Thank you for the question. I am afraid I can not answer without more specific information. What exactly would you like to model?

0x7D6E7FD commented 2 months ago

Hello, Yes. I am trying to evaluate a fully digital framework where I have 8T latch as memory cell and Wallace tree as component. If this is possible , Can you please point me out where should I look at to specify this components ?

tanner-andrulis commented 2 months ago

The workspace/README file references tutorials that you should complete. After completing those tutorials, look through the different notebooks in workspace/models/arch/1_macro/*. The Colonnade architecture is an example of a digital CiM macro if you'd like to use it as a reference. The workspace/models/memory_cells directory includes memory cells if you'd like to implement the 8T latch.

0x7D6E7FD commented 2 months ago

Thanks, I was trying to debug to understand the calculations behind the scene. In one Issue I saw you mentioned to look into the parsed-processed-input.yaml file. what does this file capture ? I was checking this file for different architecture and found out each dac_x2x_ladder component is in each file. But I thought that this element only should be in wang_vlsi_2022 architecture. I found this element in the colonnade architecture's parsed-processed-input.yaml file as well. Can you please let me know what is the easier way to debug or understand which component to use for which architecture ?

tanner-andrulis commented 2 months ago

The parsed-processed-input.yaml file gathers all input files into one place, including substituted & parsed variables. As part of this file, it gathers all components into one components list. Note that this does NOT mean that the components are necessarily used in the architecture; only that they are present in the input files. To see what components are actually used in the architecture, look at the timeloop-mapper.accelergy.log file in the same directory. You can also run "accelergy -v parsed-processed-input.yaml" to see what plug-ins are called to model each component.

To understand what attributes to add, I will refer you to the exercises in the exercises repository on using Accelergy (https://github.com/Accelergy-Project/timeloop-accelergy-exercises). After completing these tutorials, you can look at any of the following:

0x7D6E7FD commented 2 months ago

Hello,

while running any of the macro's from _guide.ipynb file I am not seeing any tile or chip design mentioned although there are yaml files stated different tile and chip in 2_tile and 3_chip folders. can you please explain which one is default and where it is mentioned ?

I didn't see any tile or chip mentioned also while full DNN is also running.

def get_spec( macro: str, tile: str = None, chip: str = None, system: str = "ws_dummy_buffer_one_macro", iso: str = None, dnn: str = None, layer: str = None, max_utilization: bool = False, extra_print: str = "", jinja_parse_data: dict = None, ) -> tl.Specification: paths = [ os.path.abspath( os.path.join(THIS_SCRIPT_DIR, "..", "models", "top.yaml.jinja2") ) ]

tanner-andrulis commented 2 months ago

The top-level file top.yaml.jinja2 defines the top-level architecture. It accepts "None" as an option for tile and chip, which omits any tile and chip components from the hierarchy. You may reference a tile or chip in the get_spec arguments, which would add additional components to the hierarchy such as networks, input/output buffers, or other components depending on what tile and/or chip you include. If you don't include any, the "ws_dummy_buffer_one_macro" system is used with no tile or chip-- essentially connecting your macro directly to a zero-energy backing store so you may model the macro in isolation.

0x7D6E7FD commented 2 months ago

Thanks a lot for quick responses.

I have some other questions too.

1. If I want to define a new compound component with subcomponent class , how do I know that if that subcomponent class is in timeloop/accelergy or not ?

Suppose in colonnade macro, there is a compound component named colonnade_cim_logic . I want to edit this component's one subcomponent like this : want to replace the adder compound_component . instead of intadder as subcomponent class .I want to use a xnor gate . I am getting accelergy plugin failure if I write xnor in the place of intadder.

My question is if I want to make a compound component , how do I know which basic components are available to do that ?

2. I mean in the accelergy/timeloop tutorial ( how to write compound component section) I saw code but how will I know that what are the things in subclass ? class: class_of_subcomponent_1 # class must be defined by a plug-in or compound component How do I know this : # class must be defined by a plug-in or compound component ?? I have also seen the how to add accelergy plug-in in the timeloop/accelergy tutorial. Can I add any component in that way ?

3. Also , another question, suppose I want to design compound component like : booth encoder , wallace tree adder. How do I define that ? do I need to define this compound component just like in albeiro_isca_2021 folder did ?

0x7D6E7FD commented 2 months ago

I have one more question : How did you realize the post accumulator of colonnade in the code in your framework ?

tanner-andrulis commented 2 months ago

In response to: "My question is if I want to make a compound component , how do I know which basic components are available to do that ?"

I see that there is a following question in this comment, but it is difficult for me to parse. Would you be able to format the question such that the code blocks are shown more clearly? You may embed YAML in markdown comments using the following syntax: https://stackoverflow.com/questions/75548339/mixing-yaml-and-markdown

tanner-andrulis commented 2 months ago

In response to "I have one more question : How did you realize the post accumulator of colonnade in the code in your framework ?"

The post accumulator was not modeled, since it was not included in the evaluation for the original paper. You may model it by incorporating a shift and add circuit.

0x7D6E7FD commented 2 months ago

I have edited my questions. Hope now it is understandable.

0x7D6E7FD commented 2 months ago

Also , another question, suppose I want to design compound component like : booth encoder , wallace tree adder. How do I define that ? do I need to define this compound component just like in albeiro_isca_2021 folder did ? or I can do that by adding accelergy plug-in stated in timeloop/accelergy tutorial 05_creating_accelergy_plug_ins folder ?

tanner-andrulis commented 2 months ago

Either of those options work if you'd like to define a new component!

0x7D6E7FD commented 2 months ago

Sorry , I am pinging you too much. I have one more question.

In the tutorial in this workspace/tutorials/1_cim_macro_intro.ipynb location says : Containers are identified with the !Container or!ArrayContainer tags. The!ArrayContainer tag is a special type of container that represents a container that is subdivides a CiM array.

workspace/models/arch/1_macro/albireo_isca_2021/arch.yaml in this file, PLGC is defined as !Container why not !ArrayContainer ?

Asking because according to the paper , Top macro should be consists of 9 PLGCs. So , shouldn't it be !ArrayContainer in the arch.yaml ?

tanner-andrulis commented 2 months ago

Albireo is a full chip composed of 9 independent PLCGs. We consider each of these PLCGs to be an independent CiM array, so the PLCG is a !Container because it does not subdivide the array.

0x7D6E7FD commented 1 month ago

Hello,

  1. I want to write the weights in 16 row x32 columns dimension CiM of my fully digital CiM. This weight writing will take 16 clock cycle for each column. And Each weight is 14b . after that It will start computing when input will arrive in the row .

How do I control the clock for writing and computations here?

  1. The Memory cell I am trying to model is like this : It holds 14b weight. so it will be 14 counts of 12T latch cell. Then after 9 bit input comes , then there are a small logic cell which take 14b weight and 9 bit inputs and make partial products.

So, the smallest cell basically doing 14bx9b multiplication kind of thing. But in CiMLoop I saw cim_unit that only calculates 1bx1b operations. How do I modify that ?

Can you pleases Kindly answer these questions ?

tanner-andrulis commented 1 month ago
  1. Most CiM architectures in CiMLoop have a column_bandwidth_limiter component that you can edit to do this. Specifically, each column should have the bandwidth to (A) write one 14b weight, and (B) compute with (#rows) 14b inputs

  2. CiM units can comprise multiple memory cells. Set CIM_UNIT_WIDTH_CELLS to 12 for your architecture to get 12b weights in each CiM unit. You can change the input bits each CiM unit uses by changing the DAC precision parameter. Since you don't have a DAC, you could also do a global find for the DAC precision parameter, see where that affects the memory cells, and fix it there.

0x7D6E7FD commented 1 month ago

Thanks for your answer. I have few other questions too.

  1. Regarding your first answer about column_bandwidth_limiter one 14b weight will be written in a column in one cycle ?
  2. In Colonnade_jssc_2021 architecture file, there is Input_row_driver component. Does it include a DAC by default ? I mean colonnade is a digital CiM and it should not have a DAC, That's why asking.
  3. Also, all the architecture shown in the repo are by default weight stationary (written in the paper). Can it be made input stationary as well ?
  4. What does CIM_UNIT_DEPTH_CELLS and N_ADC_PER_BANK variable exactly means ? Thanks.
tanner-andrulis commented 1 month ago
  1. Yes
  2. No DAC is included by default.
  3. Yes, they can be made input-stationary by changing the appropriate constraints. Most storage units use default constraints that disallow iteration over weights; to change to input stationary, you may change these default constraint definitions to disallow iteration over inputs.
  4. N_ADC_PER_BANK is the number of parallel ADCs that read analog values from the CiM array. CIM_UNIT_DEPTH_CELLS is defined here https://github.com/mit-emze/cimloop/blob/main/workspace/tutorials/glossary_definitions.md