riscv / riscv-opcodes

RISC-V Opcodes
https://riscv.org/
BSD 3-Clause "New" or "Revised" License
696 stars 303 forks source link

Make `instr_dict.yaml` more useful #123

Open jmerdich opened 2 years ago

jmerdich commented 2 years ago

While the individual code generators are useful, it'd be nice to have a machine-readable version-- sometimes the default generators just aren't a fit! instr_dict.yaml almost gets there, but it only includes the instructions and none of the other useful sideband data like csrs or field positions.

To that end, I'd like to propose the following:

I'm willing to put in effort here, but is this something the community is interested in?

aswaterman commented 2 years ago

cc @neelgala

SpriteOvO commented 2 years ago

+1, I agree that instr_dict needs some improvements.

The current parse.py generates a instr_dict in yaml format. A snippet looks like this:

amomaxu_d:
  encoding: 11100------------011-----0101111
  extension:
  - rv64_a
  mask: '0xf800707f'
  match: '0xe000302f'
  variable_fields:
  - rd
  - rs1
  - rs2
  - aq
  - rl
amomaxu_w:
  encoding: 11100------------010-----0101111
  extension:
  - rv_a
  mask: '0xf800707f'
  match: '0xe000202f'
  variable_fields:
  - rd
  - rs1
  - rs2
  - aq
  - rl
amomin_d:
  encoding: 10000------------011-----0101111
  extension:
  - rv64_a
  mask: '0xf800707f'
  match: '0x8000302f'
  variable_fields:
  - rd
  - rs1
  - rs2
  - aq
  - rl
amomin_w:
  encoding: 10000------------010-----0101111
  extension:
  - rv_a
  mask: '0xf800707f'
  match: '0x8000202f'
  variable_fields:
  - rd
  - rs1
  - rs2
  - aq
  - rl

We can see two problems here:

Although these can be solved backward-compatible (I'm not sure if it satisfies the yaml standard spec):

"amomaxu.d":
  encoding: 11100------------011-----0101111
  extension: ["rv64.a"]
  mask: '0xf800707f'
  match: '0xe000302f'
  variable_fields: [rd, rs1, rs2, aq, rl]
"amomaxu.w":
  encoding: 11100------------010-----0101111
  extension: ["rv.a"]
  mask: '0xf800707f'
  match: '0xe000202f'
  variable_fields: [rd, rs1, rs2, aq, rl]
"amomin.d":
  encoding: 10000------------011-----0101111
  extension: ["rv64.a"]
  mask: '0xf800707f'
  match: '0x8000302f'
  variable_fields: [rd, rs1, rs2, aq, rl]
"amomin.w":
  encoding: 10000------------010-----0101111
  extension: ["rv.a"]
  mask: '0xf800707f'
  match: '0x8000202f'
  variable_fields: [rd, rs1, rs2, aq, rl]

I believe instr_dict is generated for machine-readable purposes, so IMO json is probably better than yaml here:

{
  "amomaxu.d": {
    "encoding": "11100------------011-----0101111",
    "extension": ["rv64.a"],
    "mask": "0xf800707f",
    "match": "0xe000302f",
    "variable_fields": ["rd", "rs1", "rs2", "aq", "rl"]
  },
  "amomaxu.w": {
    "encoding": "11100------------010-----0101111",
    "extension": ["rv.a"],
    "mask": "0xf800707f",
    "match": "0xe000202f",
    "variable_fields": ["rd", "rs1", "rs2", "aq", "rl"]
  },
  "amomin.d": {
    "encoding": "10000------------011-----0101111",
    "extension": ["rv64.a"],
    "mask": "0xf800707f",
    "match": "0x8000302f",
    "variable_fields": ["rd", "rs1", "rs2", "aq", "rl"]
  },
  "amomin.w": {
    "encoding": "10000------------010-----0101111",
    "extension": ["rv.a"],
    "mask": "0xf800707f",
    "match": "0x8000202f",
    "variable_fields": ["rd", "rs1", "rs2", "aq", "rl"]
  }
}
neelgala commented 2 years ago

Hi.. sorry for the delay in getting to this.

@jmerdich thanks for those inputs - I am in agreement with all your suggestions and if you could submit a PR I can help review it. Since the concern is with just dumping the yaml contents I think it shouldn't break anything in this repo. Other projects consuming the instr_dict.yaml from this repo will obvioulsy have to make changes - so if someone has an issue with this they should speak now.

Regarding versioning - I would suggest a very basic light weight calendar versioning done by the CI would be best.

@SpriteOvO regarding compactness : I have used pyyaml and ruamel and unfortunately neither dumps the lists in a single-line while keeping everything else coherent.

I don't see a real issue with the replacing . with _ because the spec never uses _ in their instruction definition. So that probably wouldn't be a critical issue for now.

I don't mind switching to json considering there is probably greater support in multiple langauges to parse json than yaml.