pulp-platform / tech_cells_generic

Technology dependent cells instantiated in the design for generic process (simulation, FPGA)
Other
32 stars 30 forks source link

SRAM implementation and generalisation #14

Open zarubaf opened 3 years ago

zarubaf commented 3 years ago

Problem

The current tc_sram wrapper does not generalize to more than one clock and the ports always come with read and write capabilities. That is a bit too limiting for the memories that are usually available in modern techs.

Furthermore I think it would be nice to capture all the behavior of the memory in a commonly understandable format (json, protofbufs) so that we can extend and automatize the generation of the SRAM instantiations.

SRAM Request Format

This is my straw-man proposal:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$id": "http://pulp-platform.org/snitch/memory.schema.json",
    "title": "memory_schema",
    "description": "Generic Schema to describe properties of SRAMs.",
    "type": "object",
    "required": ["num_words", "data_width"],
    "properties": {
        "description": {
            "type": "string",
            "description": "Optional description."
        },
        "num_words": {
            "type": "integer",
            "description": "Number of words in data array."
        },
        "data_width": {
            "type": "integer",
            "description": "Data width in bits."
        },
        "initialization": {
            "type": "string",
            "description": "(Optional) initialization of memory.",
            "enum": ["none", "ones", "zeros", "random"],
            "default": "none"
        },
        "implementation": {
            "description": "(Physical) Implementation details.",
            "type": "object",
            "properties": {
                "type": {
                    "type": "string",
                    "description": "(Optional) Implementation type. Register file or SRAM.",
                    "enum": [
                        "rf",
                        "sram"
                    ]
                },
                "optimization": {
                    "type": "string",
                    "description": "(Optional) Optimization hint. Should this be tuned towards speed or density.",
                    "enum": [
                        "perf", "density"
                    ]
                }
            }
        },
        "independent_clocks": {
            "type": "boolean",
            "description": "Each port has an independent clock associated (multi-clock memories).",
            "default": false
        },
        "ports": {
            "type": "array",
            "items": {
                "type": "object",
                "description": "Description of a port of the memory array.",
                "additionalItems": false,
                "minItems": 1,
                "properties": {
                    "read": {
                        "type": "boolean",
                        "description": "Enable read capabilities on port.",
                        "default": true
                    },
                    "read_latency": {
                        "type": "integer",
                        "description": "Number of latency from read request valid to read data valid.",
                        "default": 1
                    },
                    "write": {
                        "type": "boolean",
                        "description": "Enable write capabilities on port.",
                        "default": true
                    },
                    "byte_width": {
                        "type": "integer",
                        "description": "Byte width in bits. In cas the byte width does not divide the data width the most significant byte will be smaller (i.e., truncated).",
                        "default": true
                    },
                    "byte_mask": {
                        "type": "boolean",
                        "description": "Has a byte mask, i.e., bytes can be written independently.",
                        "default": 8
                    }
                }
            }
        }
    }
}

I think in the future it can hold much more, such as BIST capabilities, physical design aspects (i.e., number of rails, preferred aspect ratio, etc.). Ideally, we would end up with a generation infrastructure for each technology that sanitizes the inputs for a given technology and generates the infrastructure.

Re-structure

I would furthermore propose that we split the tc_sram into a tc_sram and tc_sram_core. The latter just containing the memory array (i.e., the thing that will be replaced by tech-specific stuff). That will also allow us to implement (or provide an implementation) of the core as flip-flops or latches. That is something we can also provide as open-source.

thommythomaso commented 3 years ago

Can we describe write-only ports? This would be a requirement for some twoport mems.

So tc_sram is the module that will be generated from the description above, right? So it will (technology-specific) instantiate rf or SRAM macros. In this case, having a core abstraction does not make sense to me, as tc_sram will already instantiate the final macro blocks. I would rather opt for a technology-independet or -dependent (physical) implementation option for latches or ff.

zarubaf commented 3 years ago

Can we describe write-only ports? This would be a requirement for some twoport mems.

Yes, that would be possible. There is a variable amount of ports possible and each port can be read and/or write.

So tc_sram is the module that will be generated from the description above, right? So it will (technology-specific) instantiate rf or SRAM macros. In this case, having a core abstraction does not make sense to me, as tc_sram will already instantiate the final macro blocks.

Yes, tc_sram would be the thing the user instantiates and which will be generated with the tech-specific memories instantiated. The idea of having a separate core/array implementation would be that it is not always straightforward when it makes sense to replace something with generated arrays or synthesized ones. So if we factor out the core to a synthesizable array we can leave it up to the physical implementation (and script infrastructure) whether it makes sense to keep the generic ff/latch-based core or to instantiate the generated register file or sram.

I would rather opt for a technology-independet or -dependent (physical) implementation option for latches or ff.

So in a nutshell, exactly what I meant. To keep the interface the same (tc_sram) the only option in SV is to wrap this into a tc_sram_core_ff and instantiate it in (tc_sram). I am against burdening the user with instantiating the "right" thing (i.e.., either tc_sram_ff, tc_sram, tc_sram_latch. This will make it impracticable to easily switch between different implementations, which imho, is a desirable goal.

thommythomaso commented 3 years ago

If the cores are just latch and ff memories that can be instantiated automatically in tc_sram. I agree with this approach. It's important to just have one tc_sram. My initial thought was to directly "generate" the ff or latch memory code in the if/else if/else blocks of tc_sram (for std-cell-based memories) but to encapsulate this with cores makes it nicer.

Also the interface between cores and tc_sram can be technology-specific.

meggiman commented 3 years ago

So let me try to summarize your idea to verify that I understand your intention: The user that wants to instantiate a tc_sram adds this repo to dependencies of his ASIC project. The user then writes a specification for each memory type he needs according to the schema you proposed above. The user then starts a scripts that uses the spec as an input and generates a tc_sram module that internally directly instantiates the SRAM macros in the target tech. If the user wants to use an FF based tc_sram, he doesn't auto-generate the tc_sram but uses an existing tc_sram.sv that wraps the tc_sram_core_ff. Is that your idea?

cwhaat commented 2 years ago

Hello. I am trying to implement this design and am currently stuck at instantiating an SRAM macro for the core memory of tc_sram. I wanted to know if the tc_sram_core_ff file, as mentioned by @zarubaf is already incorporated in the core, somewhere. I wasn't able to find it. If not, is there a plan to create such a file in the future ?