zmkfirmware / zmk

ZMK Firmware Repository
https://zmk.dev/
MIT License
2.59k stars 2.67k forks source link

Towards Generic Behaviors #213

Open innovaker opened 3 years ago

innovaker commented 3 years ago

Background

Outputs via HID are defined by 2 'coordinates':

Technically there's other HID metadata that defines an input/output - i.e. Collection Physical, Collection Logical, Usage Switch, Usage Modifier - but my current understanding is that they're irrelevant for this context due to the constraints of our HID reports and how OS drivers interpret/treat them?  At least for keyboards, mice, joysticks - consumer too?  If I've got that wrong, please correct me because it's critical!

If utilized in their 32-bit forms, these 'coordinates' amount to 8-bytes, or 2 integers. This makes HID bindings tricky to define in the byte-array keymaps currently used by ZMK. It's partly why we currently have &kp and &cp which each symbolize the meaning of 6 of the bytes.

Generic HID Behaviors

I'd like to propose moving towards generic HID behaviors.  They work with:

Example Syntax

&p A ... or ... p(A) ... // press A &p M_1 ... or ... p(M_1) ... // press Mouse Button 1 &ht LSFT C_NEXT ... or ... ht(LSFT, C_NEXT) ... // hold for LFST, tap for Next Track

I believe this syntax is concise, flexible and should be future-proof too. It should also upgrade gracefully by making &kp and &cp aliases their of successor (they'd become legacy).

How?

It's a compromise between concise syntax, FLASH and SRAM. It relies on encoding (bit-packing) the 'coordinates' within the definition of each key (keys.h). Like this ...

This combines well with @okke-formsma's modifier proposal #86 as well my suggestion for improving the shift syntax.

Implementation

How precisely the encoding is achieved is implementation detail and I'm still finessing it. But an ongoing exploration of the possibilities can be found here: https://docs.google.com/spreadsheets/d/1Il4vEMcD3YD7sVDYb6pnMQfNpK-Pmp5AYrStf97DtI0/

Constraints

Notes:

Alternatives

The $1,000,000 Question: Non-HID Generic Behaviors too?

I welcome all comments, suggestions and discussion.

innovaker commented 3 years ago

I haven't touched the document since posting this proposal as I wanted to gauge interest before putting more time into it. If it gathers support then I'll spend more time nailing down the finer detail w.r.t. the bit-encoding.

petejohanson commented 3 years ago

This is needed definitely for the modifiers work. I'm still not sure we need to go to the level of encoding the application usage into these, so we can just use "one behaviour" for sending all HID data.

That feels too abstracted, for our needs. Can we not infer the application from the usage page for the given keycode? Have some convention at least for that, that could be stepped out of for crazy use cases we don't need for normal usage?

innovaker commented 3 years ago

Thanks @petejohanson.

I don't believe so. For simple keys that's possible, but for buttons or axes the application is needed for context (such as mouse, or joystick for instance). That context would otherwise have to be provided by a behavior - much like we currently do for item usage pages - which leads to:

Also, I'd like to have more than the current push-release behavior (which is effectively what &kp is). I can think of use cases for push (down only), release (up only), toggle, tap, hold, release-push (inverted). Taking mouse button behaviors into account, you'd end up with many more behavior tags if application wasn't encoded.

If someone wants to add Joystick or Gamepad controls further down the line too - either buttons or axes - it's yet another multiplier. The same goes for Wireless Radio Controls perhaps? I'm just spitballing here merely to show how it can snowball. More on that can be found here: https://usb.org/sites/default/files/hut1_2.pdf#page=31 - those are only the generic desktop application usages. There's others on other pages (less relevant).

Finally, without incorporating the application, it splits HID across behaviors with parallel conventions. With application, a mouse button simply becomes another HID (keys) code - i.e. &p M_1 or &p M1. That's less cognitive load for users and easier to grasp and document as well. Users probably don't want to care if it's keyboard, consumer or mouse - that's implementation detail for them. It's also more concise as the alternative would have to be .... &mp M_1 or &mp M1.

In many ways, HID has already solved this problem for us, we just need to squeeze/encode it into a smaller package whilst minimizing the mapping we do.

Any thoughts?

innovaker commented 3 years ago

Playing Devil's Advocate, I guess the other question is ... why wouldn't we want to approach it this way? What are the alternatives? What are their benefits and costs?

petejohanson commented 3 years ago

I think the concern I have is trying to automatically link a generic behavior to any number of HID applications, and not write ourselves into a corner. Are they entirely separate applications? Different logical groups? How does this work for custom behaviors looking to do custom HID stuff?

It seems like a lot to try to "Get Right" the first time.

innovaker commented 3 years ago

I think the concern I have is trying to automatically link a generic behavior to any number of HID applications, and not write ourselves into a corner. Are they entirely separate applications? Different logical groups?

That's an excellent point and it touches on my opening paragraph to the proposal. You're right, we do need to be confident that this is how HID works before we go down this route. Everything I've read and encountered so far suggests this is the case. But I could be wrong! I need someone else to confirm it really.

My background with HID covers:

I've yet to encounter duplicate applications or duplicate usages within a single report. My understanding is that's partly why the concept of multiple reports exist. So I'm fairly confident that our likely use cases are covered.

But the devil is in the details. That detail is in this document: https://usb.org/sites/default/files/hid1_11.pdf. And I suspect these bits are relevant:

It also depends on the finer details of each type of Collection. But I don't think they affect this discussion? Can anyone verify that?

This paragraph from the Windows documentation also raises my eyebrow: An unnested collection is always a top-level collection, regardless of its HID type. In particular, a top-level collection does not have to be an Application collection, as defined by the USB HID Standard. I don't know what usage you'd use as a top-level collection if it isn't an application collection - shrug. Presumably a code which is mappable to a Windows PDO? Perhaps to support quirky report implementations?

So, I think so, based on how I've interpreted it, but my understanding has never been validated by a domain expert. The only way I can see of us being sure, is for someone else to read the specification and confirm my instincts. Or for an experienced HID expert to validate it. But given that HID is the primary target for ZMK, it has to be worth it right?

How does this work for custom behaviors looking to do custom HID stuff?

As long as we plan ahead, I think we'll be alright. You can't magically fit 8 bytes into 1.5 bytes (allowing for modifiers) without cutting some corners, but the corners we'd cut are dead space, as well as the usages that are unfeasible for a keyboard or even a multi-application HCI device. That's the finessing I was talking about. Moreover, as it's all internal implementation detail, we are afforded a fallback or backup position if we find we screwed up. It's probably easier to attempt generic behaviors and then fallback onto application specific behaviors, rather than the reverse. We've also the option of a bespoke encoding.

innovaker commented 3 years ago

For completeness and observers, the other alternative that hasn't been discussed in this thread yet is the one I listed as:

Magic numbers or mass re-mapping of all codes

I believe this is the approach taken by Linux, QMK and others. In essence the key codes are one long set of codes - effectively an enumeration. Most of the meaning (behavior, application, item) is implied by a single number and bit flags. We've tended to call them magic numbers.

The disadvantages of magic numbers include:

But they do have their merits too, especially for the lower-end chips!

It was an early decision for ZMK not to go down that route in favour of behaviors because at the time it felt like a better choice. In the context of ZMK, I guess the equivalent is currently two-fold:

Effectively the meaning is wrapped up into the data associated with those two parts. Behaviors can also act as namespace divider to some extent (i.e. a parameter 0x01 has a different meaning for each behavior), although namespaces can also be shared across behaviors.

When we eventually start looking beyond the Device Tree (DT) for keymap/binding configuration, there's always the possibility this could change. Any approach that involves another system such as ZMK Studio, will have to encode and serialize the configuration. This includes behavior identities, regardless of whether it's strings or numbers. If it's numeric, we'll probably have more spare bits to play with (as the number of behaviors will always be relatively small) which is food for thought. That's a conversation for another day, but it will probably open doors on further "internal" encoding optimization in the future. We just need to be sure we don't close any doors too early with the decisions we make now.

innovaker commented 3 years ago

Had a thought when I woke up today.

We can both:

by localising the use of the encoded keycode to the keymap interface only, which is its primary purpose anyhow.

In practice that means we would:

This would allow us to facilitate override behaviors or behavior metadata for any usage pages that don't make it into the encoding if the need arises (which if we do our homework for the finessing, will probably never even be an issue).

Effectively the keycodes only become keymap notation shorthand. The rest of the internal state can be more explicit.

A cursory look at the current system suggests we'd also need to do some behavior refactors to avoid repeating ourselves, but that shouldn't be an issue.

petejohanson commented 3 years ago

Looking at this, I really think I would favor an incremental approach to this, and if we can keep the "external contract" for the keymaps themselves stable, we'll have a win here. In particular, I was reading https://usb.org/sites/default/files/hut1_21.pdf the section "3.1 HID Usage Table Conventions", which states

Usages are 32-bit identifiers, where the high order 16 bits represents the Usage page and the low order 16 bits represents the Usage ID. To allow more compact Report descriptors, Usage Page items can be declared to specify the high order bits of the Usage item and the Usage items can declare only the ID portion of the Usage, as follows:

So, we would be very inline w/ HID itself to use the single 32-bit parameter to various behaviors, e.g. &kp FOO to encode both the page and usage ID for a given location, avoiding the awkward cp versus kp crazy, and making hold-taps work properly for both, etc.

Reviewing the possible usage page values, I see no need for any pages that use the top 8 bits, e.g. we only need the 0x01 to 0x0F range, leaving those top bits as ripe space for storing the extra modifier information we need for shifted keycodes.

We can imagine then something akin to:

#define BANG Z_MOD_KEYCODE(MOD_LSFT, HID_USAGE_PAGE_KEYPAD, HID_USAGE_NUM_1)

To encode the mods, usage page, and usage ID all in one 32 bit value.

I do understand this doesn't encode the second coordinate at this point. I believe we at this point don't have a need for multiple reports that send keypad or consumer values, and it's reasonable, since this is a keyboard firmware, to unpack the encoded format, and use the usage page to determine what report field to update w/ new state, etc.

Should we later decide for some more compact encoding internally that is even more sneaky w/ our "wasted" bits to encode the other coordinate, the keymap consumer will still just say "I use the BANG define", and doesn't need to be any the wiser that behind the scenes that is encoded any differently.

Thoughts? Concerns?

innovaker commented 3 years ago

Thanks for looking at this @petejohanson.

Your suggested approach is the same as my initial plan for this proposal. The application coordinate however, becomes important once we start using any controls from the generic pages - such as buttons or axes. BUTTON 1 from button page or Axis X from the generic page are not indicative of their reports - because they're generic by design. Sure, in a strictly constrained system you might get away with declaring that BUTTON 1 should always go to the Mouse report and using extra layers of conditional logic to do so, but that breaks down as soon as someone wants to use BUTTON 1 for a different purpose. I believe that's one of the purposes of application - to provide the necessary context for generic HID controls.

I appreciate that you prefer incremental changes. So sure, for the interim:

Let's continue discussing the encoding however - specifically application - because it's one of the main purposes of this proposal. It's necessary for these reasons:

Are there any other blockers besides verifying the purpose of application within HID?

Aside:

petejohanson commented 3 years ago

The main other current blocker would be the lack of room in the current 32-bit param from the DT to allow encoding all of that nicely. And keeping this in one 32-bit param is really important to leveraging the existing behavior work from @okke-formsma and others to support hold-taps w/ modifier keycodes, e.g. ! in a hold-tap, or "Auto Shift" as noted in Discord.

So, I'm really happy to continue the discussion, especially for "post DT" targets, but in the interim, I think we're on the same page on an encoding strategy that we can do ASAP, building on your already awesome HID work.

I think we should work on getting your generated HID stuff in, then work on this as a next step for that, to unblock the modifier work @okke-formsma already has spearheaded.

Any concerns?

innovaker commented 3 years ago

That's what this proposal is attempting to solve though? I designed it with @okke-formsma's work in mind in conjunction with the upcoming needs (Mouse etc.).

The modifier work has never been blocked by this issue. It compliments it.

petejohanson commented 3 years ago

I'm not saying this closes this issue, just talking about how I am proposing we implement things today to get the modifier and usage page stuff addressed ASAP, in a way that doesn't make this issue harder to work on as a follow up.

innovaker commented 3 years ago

Sure, the checklist in my post above is the step-by-step for the immediate concerns.

okke-formsma commented 3 years ago

I like the approach you guys figured out, as it checks all the boxes we currently need and gives enough flexibility for the future. Let's update the keycode defines and get the modifiers up-to-date so we have some short term profit from all this work :)

innovaker commented 3 years ago

Each of the steps I described above is effectively its own issue / PR, the whole lot being a small epic or game plan.

@okke-formsma's probably best placed to do the modifier bits. I'm best placed to do the key code bits as follow up to #21. So it'll need a degree of coordination.

I suggest we create a tracking/checklist issue based on my original checklist so that we can assign/track each part, and continue using this current issue for ongoing discussions about the other aspects of the proposal.

and-elf commented 3 months ago

I'm entirely new to the codebase, and have only skimmed through the current implementation, so I bet I'm way off.. The behaviors are already referenced/defined in the linker script, right? Then, why not just set up a large-ish static section for it? To update behaviors, the usb/ble hid could enumerate a device for it, and we could have a small tool (basically an llvm-based compiler) to generate the packed struct data as a blob.

Conceivably, the tool could even be javascript-based and run client-side, and using html5, the blob could be written to the enumerated device. Pehaps work together with https://github.com/nickcoutsos/keymap-editor?

I think pretty minor changes would need to be done in the code base, but the tool may be a bit complicated, especially in javascript..