Lack of ability of identifying different client in input-method-unstable-v2.xml

swaywm / wlroots

A modular Wayland compositor library

https://gitlab.freedesktop.org/wlroots/wlroots/

MIT License

2.15k stars 343 forks source link

Lack of ability of identifying different client in input-method-unstable-v2.xml #2352

Open wengxt opened 4 years ago

wengxt commented 4 years ago

One of the important feature for existing input method under X11 or via dbus by im module is to distinguish the client that typing text.

Right now, input method under X11 can achieve feature like e.g.:

Terminal - type latin
Firefox - type Chinese

When you switch between the window, the input method engine used by different application is remembered and can be different. This is widely supported by Windows, Mac, X11 and very important to CJK users.

Looks like the input method server will only have one global zwp_input_method_v2 object and just can't distinguish the client.

wlroots has migrated to gitlab.freedesktop.org. This issue has been moved to:

https://gitlab.freedesktop.org/wlroots/wlroots/-/issues/2352

wengxt commented 4 years ago

For example, XIM has following concepts: input method id , input context id https://www.x.org/releases/X11R7.6/doc/libX11/specs/XIM/xim.html

So the client can be distinguished easily on application level and input context with in application level.

Especially for desktop users, it is unexpected to see "Switch to Chinese to type in browser" "switch back to terminal" "now input method is in Chinese". By default application should use different state of input method. And more importantly, the concept of input context allows input method server to track the life time of an application in order to allocate and manage resource for different client application.

emersion commented 4 years ago

cc @dcz-purism

dcz-purism commented 4 years ago

I see that input-method-id is supposed to be information to the client, so I guess it's not needed here.

The input-context-id is specified by the IM Server to identify the client (IC).

XIM seems to open separate connections to each client, and each client has multiple input fields.

Are preferences remembered per-client, per-window, or per-text field? When I switch languages on my KDE install, it's per-window only.

I can see 3 options here:

compositor assigns each client an ID and sends it to the input method whenever focus is switched (per-client)
the client assigns each text field another ID, and sends using text-input to the compositor. Compositor assigns each client an ID, and sends both to the input method (per-client/field, IM can choose)
the client assigns each text field another ID, and sends using text-input to the compositor. Compositor assigns each client an ID, and sends a hash of both to the input method (per-field)

Are preferences remembered after the application is closed and reopened? Stable text inputs might be doable with 2. or 3., but how useful would that actually be?

wengxt commented 4 years ago

https://github.com/fcitx/fcitx5-qt/blob/master/qt5/platforminputcontext/qfcitxplatforminputcontext.cpp#L630

For example, in fcitx5-qt, we create input context based on application window. There's an option in fcitx server to allow use to share the input method state based among Window/Application/Everything.

I don't consider per-text-field to be too useful, if text-field has different property, they would be notified by hint/purpose, or reset when moving from one to another.

In fcitx, we don't remember anything if the application is gone, but we do send over application name. This is mainly because some multiple process application can then be group together, for example chromium or when you option multiple kwrite. Certain feature is supoorted by mac OS to set input method on per application level (E.g., when you start terminal, it's always latin keyboard). I assume this should be somewhat bind with the feature provided by xdg-shell to provide the application info.

There are a lot of input state being remembered during application is running with in fcitx. Most important one is the input method engine being used. Comparing to mobile, desktop is multi-tasked so this is very important for users. Actually fcitx does not care how input context is associated with window/app, but in practical, input context seems to be bind with window (in XIM, Gtk, or Qt).

A lot of other information are being kept on per input context basis. For example, the surrounding text is cached on fcitx side because some application doesn't actively send it over when lost focus or regain focus. Fcitx would also remember the last commit text for application doesn't provide surrounding text to achieve certain feature, e.g. commit full width period "。" or period "." based on the context.

Also, some input method provides submode within the engine, for example, anthy/kkc/mozc has hiragana/katakana mode, this information is also remembered by the input context.

Also, how to create input context should be decided by application, I always consider input context to be the most fine-grind level of input method state that application itself want to achieve. If application think its own window should share input state, then it should create one and use it among application. If application think certain input box is so important to be used with another one, then it should create more than one.

To me, it's per-input context (or per-text-input in wayland case?) -> (per-application) -> everything.

dcz-purism commented 4 years ago

Could you give me a little background on what exactly the purpose of switching is from user's perspective?

When I imagine the instant messenger case, I would expect the preference for per-text-input ID: I use Japanese to one of my friends, and German to another. Same with the browser: I write English in about:config, and Chinese in the search box. Window/application sound much less useful to me. What am I missing?

Caching information like surrounding text is an anti-feature. Not sending surrounding text has a particular meaning in the text-input protocol. I wanted to make it as stateless as possible, (in addition to keeping the first attempt simple), and that's why I didn't include any IDs. I hope the resulting protocol can somehow still discourage caching.

wengxt commented 4 years ago

It is one way to maintain certain consistency and make application behavior predictable.

Some users may prefer that. (I mean I don’t use it that way, but others does request me to add it.)

For example, an multi window application may open or close, which means its text input come and goes(but never fully closed). Having a per application setting it’s like a pin to make sure whole application (either single-or-multi process) to always have one expected input method instead of randomly jump from one to the other.

I agree that surrounding text is not a good example for statefulness, but here’s the thing. When you design a protocol you can’t expect how your user use it, they may or may not care ime user that much or just have no idea how to fix it. (namely firefox, chromium as example because they are complex and not using toolkit native widget.)

It may take years before they fix the issue even you already file the report. Input method is not ideal and it’s above workaround culprit here and there. A more concrete example is google docs within firefox. The non-native widget is not even with in firefox but in side the webpage. It would be totally wrong if you always rely on the surrounding text to be forwarded.

Put the surrounding text story aside, many input method engine does have some certain resource to be created associate with one context.

wengxt commented 4 years ago

Another reason, state of art input method engine library are commonly to have the concept of "context". https://github.com/ueno/libkkc/blob/master/libkkc/context.vala https://github.com/rime/librime/blob/master/src/rime/context.h https://github.com/libpinyin/libpinyin/blob/master/src/pinyin.h#L35 https://github.com/fcitx/libime/blob/master/src/libime/pinyin/pinyincontext.h

And I'd like to match them with the client and I don't want to reallocate and reinitialize such data again and again for the same client.

Per-application to me, is also like a way to organize the application and to assign certain features or workarounds. For example, gvim is always blacklisted in fcitx for preedit because of a long standing bug. Some people may prefer blacklist preedit for chromium due to the weird behavior in certain webpage (Certain webpage's auto completion can't work well with preedit, I can't immediately name one right now but I have seen this cross the platform.). Such bug is not even fixable on the browser side.

Also Chromium has every strange behavior on using input context. It's not per-"text-field", it a werid combination of underlying multi-process. Some "tab" may share one input context while other random tabs are using another one. Having an application name available dose help chrome user to have predictable input method behavior. And I doubt they can easily fix it because of the underlying multi process architecture of chrome.

After all, per application is more a conceptual thing and I only expect protocol to provide an application name string which is good enough for all of fcitx's existing usecase.

dcz-purism commented 4 years ago

Sorry, I'm still a little confused on what the use case looks like.

an multi window application may open or close, which means its text input come and goes(but never fully closed). Having a per application setting it’s like a pin to make sure whole application (either single-or-multi process) to always have one expected input method instead of randomly jump from one to the other.

The way I understand what you wrote, the use case is that:

when a user dismisses a text field and later bring it back
then they expect this text field to use the same input method config as before

Is that right?

wengxt commented 4 years ago

Mainly two use case to me.

When an application create a new window,, the input method config is keep the same as the main window.
Due to different application's implementation, some application create multiple input context , some application create one for all. This is where people find it inconsistent cross application. This is worse when application using mixed toolkit, or custom toolkit. e.g. firefox's main window and its file dialog. File dialog is native gtk, but main window is implemented the other way. Some user will find it use "random" state relying on implementation details when open new window. Such option would make input method behavior consistent at least for a single application.

dcz-purism commented 4 years ago

When an application create a new window,, the input method config is keep the same as the main window.

Why is that better than starting new windows with desktop default? When I change the input language, it's usually within one text input on some random window. It never happened to me that I wanted to group applications by language: I might want a single web page to use the same language, or I might want a single input field (search box), or a single window (chat window with 1 person) to use a specific language. When is it useful to keep applications divided by input method? When is it useful that a new window uses the same input method as another window from the same application?

Point 2. doesn't matter for Wayland's text-input, input context is something created and destroyed following focus.

wengxt commented 4 years ago

@dcz-purism As I said, that's what my user asked me to support it and I still know some people would turn it on after installation till today. I personally don't find it extremely useful, but I think some people that uses multi process window editor find it useful. (The editor window belongs to different process but same app).

Point 2. My "input context" refers to the object that represents some context with in the application (not essentially a text-field, or maybe a window, or multiple text-field), not a wayland "text-input" object. And that is the whole point of this issue. Think about it, when firefox/chromium is ported to support text-input today, will it do any special coding ? No, it will just use what gtk provides and Gtk won't have enough knowledge about the widget because it is not real Gtk widget. Chromium will still randomly assign the "text-field" to random tabs. Gives user weird experience. And the reason that this is hard to fix is because the natural of multi-process rendering and the synchronization behind the tab (they want it to be parallel no synchronization on the same GtkImContext object).

Anyway per client might be good enough to me (Option 2 you mentioned), though I'd say I have many different idea about using the application information (actual app name, not a hashed id). For example: Assign default state to a newly launched application, which is supported on mac. For example, always default to latin keyboard when start new terminal.

wengxt commented 4 years ago

To sum it up, I'd like to see two things to be provided by protocol.

A identifier or wayland object that represents the "context" (per text-field if you want to call it that way). It is not created or destroyed on focus change. Conceptually, it should be associated with some UI element in the application (window, or text-field, all up to application to decide how to group it).
- The creation and deletion of this id/object need to be forward to input method server because the engine may want to have some data associate with it. So it can't be something like just get_id.
The app name (possibly reuse the xdg one), the reverse domain version for example.
- Can be used to assign the default state. (The feature available on mac OS, for example).
- Can be used to group the state (As you said you don't think this is useful, but I know some fcitx user does want it.)
- Implement some application specific feature.

dcz-purism commented 4 years ago

In Wayland, I think we're dealing with a little of a "greenfield" situation, where protocols may exist, but they are not really widely adopted. I see an opportunity in this to detach from the way the technology developed historically, and see the high level goals and priorities that the users actually want. That means distinguishing what came to be because of technological limitations and throwing it away, and keeping the parts that are really useful.

Regarding 2, as I said, I don't understand how it's useful, so I can't really help implementing it. I would just botch it. In addition, I think implementing application-specific workarounds helps lifts the pressure from the people creating broken applications. It allows moving fixes away from where they should be (application) to where they should not be (anywhere else), so I would attempt to make it impossible just on that count.

Regarding 1, before I try analyze it, I have to ask: what's the use case for that?

progandy commented 4 years ago

I do not use any input methods, but I can imagine it being useful to set different input languages for applications, i.e. have office set up for non-latin input and a code editor/IDE for primarily english input. For office it may also be useful to send a language hint depending on the language set for a document or even paragraph that is being edited. (LibreOffice has the concept of a document language and paragraph languages that are also used for spell chacking)

wengxt commented 4 years ago

In Wayland, I think we're dealing with a little of a "greenfield" situation, where protocols may exist, but they are not really widely adopted. I see an opportunity in this to detach from the way the technology developed historically, and see the high level goals and priorities that the users actually want. That means distinguishing what came to be because of technological limitations and throwing it away, and keeping the parts that are really useful.

Regarding 2, as I said, I don't understand how it's useful, so I can't really help implementing it. I would just botch it. In addition, I think implementing application-specific workarounds helps lifts the pressure from the people creating broken applications. It allows moving fixes away from where they should be (application) to where they should not be (anywhere else), so I would attempt to make it impossible just on that count.

Regarding 1, before I try analyze it, I have to ask: what's the use case for that?

Didn't I explain it already times and times in this issue? Help to maintain the input method state from input method server side. The state including but not limited to:

Which input method is being used, pinyin, mozc, or latin (Most important one!).
The required memory data structure need to be allocated from the engine to support the input. That's also another reason why the creation and deletion need to be notified.
Many context related data associate with it to achieve features. E.g. fullwidth punctuation automatic conversion.

And Please, if you're not a daily user of input method, daily, how can you say you judge those problem better than actual users?

I have been devoted to input method in linux for over 10 years. I have seen tons of application failed to support input method properly. And they are just using what is provided by the toolkit. Yet another protocol doesn't solve anything for them. They won't have their own protocol implementation. Things like poor support on preedit/surrounding would still sit there.

Regarding all of this back and forth, I'd say, if you don't understand why, let's just ask the real users. I'll ask some of actual input method users to comment about how they use input method.

yuyichao commented 4 years ago

Regarding 2, as I said, I don't understand how it's useful, so I can't really help implementing it. I would just botch it. In addition, I think implementing application-specific workarounds helps lifts the pressure from the people creating broken applications. It allows moving fixes away from where they should be (application) to where they should not be (anywhere else), so I would attempt to make it impossible just on that count.

Personally, I think ideally it won’t be as necessary in most cases. The application should take care of its default language setting, react to all user input correctly and take care of deep integration of input method on its own. However, from past experience, this is not going to happen. Bugs and features that only affect/benefit user that requires advanced input method (ones that benefit from context information) face much stronger pushback and/or much less attention then regular ones. This is why

It’s necessary to implement such integration in the input method instead (sometimes with user configuration) and,
It’s necessary for the input method to access context info for deep integration.

App name is really the simplest one of those.

Regarding 1, before I try analyze it, I have to ask: what's the use case for that?

The input method is conceptually part of the application. Just like you don’t want your document being edited to be lost when the application window lost focus or minimized, input method users may also not like if their input method’s state got lost whenever they switch focus. This is obviously only applicable to stateful input methods but that is often the case for a lot of cjk ones.

So implementation aside, there should be a way to keep track of the input context (sorry this context isn’t really the context information I mentioned above but a unique representation of the input target). It needs to provide enough features so that the information kept on the input method side can be created/updated/destroyed in sync with the app, and I think the api @wengxt mentioned above is a pretty minimal list for this purpose.

wengxt commented 4 years ago

Regard less of input method/keyboard layout, let's also check what is supported on Linux nowadays:

KDE keyboard layout switch policy: Desktop/Application/Window/Global

option in IBus Mort importantly, this option was not there when ibus 1.5 comes out (ibus 1.4 support it), but CJK users asked to add it back.

option in Fcitx

lilydjwg commented 4 years ago

I don't know input method implementation details, but FYI here's my use cases:

I'm using a per-window input method state currently on X11. I don't use per-application because there is stuff like the browser. I fake per-tab states to some extent with different windows.

When is it useful that a new window uses the same input method as another window from the same application?

When I drag out a tab from my browser so I can use two web pages side-by-side.

Regarding 2, as I said, I don't understand how it's useful

If I can set the default state per application, I wouldn't be interrupted very much and type wubi code and then delete them when my Telegram restarts because it's used too much memory. I would not to distinguish already-open applications and newly-open ones.

Regarding 1, before I try analyze it, I have to ask: what's the use case for that?

It's been said several times.

I want to type Chinese by default whenever I switch to Telegram and a browser window, and type English by default whenever I switch to terminals and another browser window. I only switch to the other state temporarily when I need to so that I won't mess them up.

It would be great if it were English by default in the URL bar and Chinese by default in some configured websites.

dcz-purism commented 4 years ago

Before you get angry with me, remember that I'm not the maintainer of waylad-protocols, so if you don't need my help, then you're free to come up with your own protocol and submit it. By an accident of fate I am the most recent person to care about it, so I was pinged. I want to help here by understanding the problem completely and coming up with the best proposal that's most likely to be accepted and which won't give a lot of trouble in the future.

I'm not perfect, and I've already managed to upstream something very broken, so I'm extra careful now. It doesn't help that I'm only using input methods since very recently, so I might need extra slow explanations. As I said – if this annoys you, find other people to write a protocol proposal.

If you want to work together, then I'm glad if we can put this behind.

That also means that I value having multiple stakeholders in the discussion: users, power users, input method maintainers, Wayland maintainers, application maintainers, toolkit maintainers, distribution package maintainers. All of them will want something (power, ease of use, their favorite workflows), and their needs should be balanced. But not all of them will be represented here, so I try to jump into their roles, understand what they could need, and resisting when some group wants to dominate. So please don't take it personally – I will resist your proposals. Admittedly, I usually favor users a little. Unfortunately, I don't completely understand the variety of use cases, so I ask for patience.

dcz-purism commented 4 years ago

That being said, I'm back to the topic:

A identifier or wayland object that represents the "context" (per text-field if you want to call it that way). It is not created or destroyed on focus change. Conceptually, it should be associated with some UI element in the application (window, or text-field, all up to application to decide how to group it).

Regarding 1, before I try analyze it, I have to ask: what's the use case for that?

Didn't I explain it already times and times in this issue?

Sorry, I might have not made it clear. What I'm looking for is the simplest explanation in the form of: user does X, user expects Y to happen. There are multiple use cases that could be boiled down to the above request. Let me make a guess though:

When user changes some input method setting on a text field,
and then restarts the application,
and then brings up the "same" text field,
then the text field in the new instance should have the same config.

Is that right? I think it's somewhat doable, but with nasty side effects depending on implementation: the application would have to guarantee stable text field IDs (but a wayland protocol can't guarantee whether these are per-application or per-window, or …), and it's not clear what happens with multiple instances (processes/connections): I don't want changing my input method in an editor instance to change input method in another editor instance. Or is it only needed per-client?

wengxt commented 4 years ago

That being said, I'm back to the topic:

A identifier or wayland object that represents the "context" (per text-field if you want to call it that way). It is not created or destroyed on focus change. Conceptually, it should be associated with some UI element in the application (window, or text-field, all up to application to decide how to group it).

Regarding 1, before I try analyze it, I have to ask: what's the use case for that?

Didn't I explain it already times and times in this issue?

Sorry, I might have not made it clear. What I'm looking for is the simplest explanation in the form of: user does X, user expects Y to happen. There are multiple use cases that could be boiled down to the above request. Let me make a guess though:
1. When user changes some input method setting on a text field,

2. and then restarts the application,

3. and then brings up the "same" text field,

4. then the text field in the new instance should have the same config.
Is that right? I think it's somewhat doable, but with nasty side effects depending on implementation: the application would have to guarantee stable text field IDs (but a wayland protocol can't guarantee whether these are per-application or per-window, or …), and it's not clear what happens with multiple instances (processes/connections): I don't want changing my input method in an editor instance to change input method in another editor instance. Or is it only needed per-client?

Before you get angry with me, remember that I'm not the maintainer of waylad-protocols, so if you don't need my help, then you're free to come up with your own protocol and submit it. By an accident of fate I am the most recent person to care about it, so I was pinged. I want to help here by understanding the problem completely and coming up with the best proposal that's most likely to be accepted and which won't give a lot of trouble in the future.

I'm not perfect, and I've already managed to upstream something very broken, so I'm extra careful now. It doesn't help that I'm only using input methods since very recently, so I might need extra slow explanations. As I said – if this annoys you, find other people to write a protocol proposal.

If you want to work together, then I'm glad if we can put this behind.

That also means that I value having multiple stakeholders in the discussion: users, power users, input method maintainers, Wayland maintainers, application maintainers, toolkit maintainers, distribution package maintainers. All of them will want something (power, ease of use, their favorite workflows), and their needs should be balanced. But not all of them will be represented here, so I try to jump into their roles, understand what they could need, and resisting when some group wants to dominate. So please don't take it personally – I will resist your proposals. Admittedly, I usually favor users a little. Unfortunately, I don't completely understand the variety of use cases, so I ask for patience.

Before you get angry with me, remember that I'm not the maintainer of waylad-protocols, so if you don't need my help, then you're free to come up with your own protocol and submit it. By an accident of fate I am the most recent person to care about it, so I was pinged. I want to help here by understanding the problem completely and coming up with the best proposal that's most likely to be accepted and which won't give a lot of trouble in the future.

I'm not perfect, and I've already managed to upstream something very broken, so I'm extra careful now. It doesn't help that I'm only using input methods since very recently, so I might need extra slow explanations. As I said – if this annoys you, find other people to write a protocol proposal.

If you want to work together, then I'm glad if we can put this behind.

That also means that I value having multiple stakeholders in the discussion: users, power users, input method maintainers, Wayland maintainers, application maintainers, toolkit maintainers, distribution package maintainers. All of them will want something (power, ease of use, their favorite workflows), and their needs should be balanced. But not all of them will be represented here, so I try to jump into their roles, understand what they could need, and resisting when some group wants to dominate. So please don't take it personally – I will resist your proposals. Admittedly, I usually favor users a little. Unfortunately, I don't completely understand the variety of use cases, so I ask for patience.

I'm sorry for my attitude. Because this protocol is never upstreamed in wayland-protocols (If it does, why the only copy of the protocol XML file is in wlroots?), I assume no one actually caring about push the protocol upstraem and that's why I assume you're the state of art maintainer for this protocol.

So to me I seems like purism might need something on the phone so they find someone to come up something and that's why this protocol is lack of support for the desktop use case. I might be totally wrong about that.

Also I left my comments on wayland maillist long time ago (both v1 and v2), but that doesn't seem to have any effect.

That's part of the reason I become emotional on the topic.

Personally, I don't want to introduce the more split on the protocol because it doesn't really help the users. I'd be happy if we could both work on it.

(Will back to technical problem in other reply)

dcz-purism commented 4 years ago

I'm using a per-window input method state currently on X11. I don't use per-application because there is stuff like the browser. I fake per-tab states to some extent with different windows.

When I drag out a tab from my browser so I can use two web pages side-by-side.

If I can set the default state per application, I wouldn't be interrupted very much and type wubi code and then delete them when my Telegram restarts because it's used too much memory. I would not to distinguish already-open applications and newly-open ones.

I think this would also be solved by simply having an ID for Telegram, not necessarily an XDG string.

It would be great if it were English by default in the URL bar and Chinese by default in some configured websites.

Thanks for the answers. The actual use cases are super useful, because they illustrate the real problems that we can now group and solve.

It seems that you are using or would like to use a mix of per-tab, per-application, and per-website config. That's why I want to be careful about going too fast and embedding xdg names or something else unchangeable. If there's a better solution for grouping, it would be a shame to ignore it already.

dcz-purism commented 4 years ago

I'm sorry for my attitude. Because this protocol is never upstreamed in wayland-protocols (If it does, why the only copy of the protocol XML file is in wlroots?), I assume no one actually caring about push the protocol upstraem and that's why I assume you're the state of art maintainer for this protocol.

The protocol that's upstreamed is text-input-v3. It's the protocol applications use, and I consider it to be "more important", because it's not optional. (I am writing a short blog post to explain those protocols, so sorry that I can't link to it now.) input-method-v2 is basically a mirror copy of text-input-v3, and text-input-v3 has bad issues, so I'm not upstreaming input-method-v2 at the moment.

lack of support for the desktop use case

Lack of manpower. Reviews of open text-input-v3 questions will push it forward faster. I don't think what we're talking about here is a desktop input case though.

As a related note, when I realized that this is going slowly, I consulted some Wayland people and I got an agreement that it's better to create a protocol that's incomplete but upstream rather than complete but gets so bogged down in discussion that no one wants to be the one to merge it, and few people use it so no one has the experience.

dcz-self commented 4 years ago

BTW, I'm working on this in my free time, so don't blame Purism ;)

wengxt commented 4 years ago

That being said, I'm back to the topic:

A identifier or wayland object that represents the "context" (per text-field if you want to call it that way). It is not created or destroyed on focus change. Conceptually, it should be associated with some UI element in the application (window, or text-field, all up to application to decide how to group it).

Regarding 1, before I try analyze it, I have to ask: what's the use case for that?

Didn't I explain it already times and times in this issue?

Sorry, I might have not made it clear. What I'm looking for is the simplest explanation in the form of: user does X, user expects Y to happen. There are multiple use cases that could be boiled down to the above request. Let me make a guess though:
1. When user changes some input method setting on a text field,

2. and then restarts the application,

3. and then brings up the "same" text field,

4. then the text field in the new instance should have the same config.
Is that right? I think it's somewhat doable, but with nasty side effects depending on implementation: the application would have to guarantee stable text field IDs (but a wayland protocol can't guarantee whether these are per-application or per-window, or …), and it's not clear what happens with multiple instances (processes/connections): I don't want changing my input method in an editor instance to change input method in another editor instance. Or is it only needed per-client?

Below is some debug information that can be obtained via Fcitx (obtained via a debug purpose dbus API).

$ dbus-send --session --type=method_call --print-reply --dest=org.fcitx.Fcitx5 /controller org.fcitx.Fcitx.Controller1.DebugInfo 
method return time=1596479512.665484 sender=:1.809 -> destination=:1.849 serial=115241 reply_serial=2
   string "Group [x11::0] has 18 InputContext(s)
  IC [a1d3b004179349d786bb16b1560f4c15] program:plasmashell frontend:dbus
  IC [4616725efafd40ffbbb8efd70a81c7ac] program:firefox frontend:dbus
  IC [6725790491394dbd85571e445017e5db] program:firefox frontend:dbus

As you can see, a program may create multiple input context (IC), and the name of the program is available. The hashed ID here is generated on the input method server side, which is mapped to another ID for communicate between different App.

For example:

$ qdbus org.fcitx.Fcitx5 
/
/org
/org/freedesktop
/org/freedesktop/portal
/org/freedesktop/portal/inputcontext
/org/freedesktop/portal/inputcontext/3
/org/freedesktop/portal/inputcontext/11
/org/freedesktop/portal/inputcontext/2
/org/freedesktop/portal/inputcontext/8
/org/freedesktop/portal/inputcontext/18
/org/freedesktop/portal/inputcontext/10

The IC from dbus frontend will be 1-to-1 mapped to the dbus object here.

I hope this can help to clarify two different requirements on the topic:

Identify the different context with one the client. The context may be refers to:
- Window, Text input box widget, browser tab. etc. How this is assigned to the actual User interface should be handled by the application itself, because they have a better knowledge about what should be grouped together. For example, URL bar, and web page may not semantically belong to the same group. But text box on same web page might belong to the same group.

Because this is an optional requirement, it is hard to find an application that actually implements this perfectly. In real life, most implementation is based on window, which means different window has different input method state. The one of the most common use case of this is: a. Terminal is using English. b. Switch to some Chat, type Chinese c. Switch back to terminal, still in English.

When I refer to "Chinese" or "English", I mean the one of the input method engine that provides the functionality for typing language. I can be actually "Pinyin" or "Wubi" for Chinese.

This is somewhat very important due to widely supported alraedy across different platforms (mainly desktop, mac win, X11), so people would just simply expect it to work.

A semantic scope for application. (Less important) There are multiple way to achieve this and these are what being used on X11.
- Use NET_WM_PID on XWindow passed via XIM.
- Use toolkit native API to get application name via IM Module (DBus based im module)

Using pid itself has its own problem, pid could be reused. Application name is parsed via procfs, which could be wrong specially for python GUI app. Use IM Module is better, but requires correct environment setup and not all im module are implement in the same way. And it would be hard to work on platform such as wayland because the input popup window positioning would require some custom protocol on wayland.

On wayland, I assume this seems to be a better option to me. https://github.com/wayland-project/wayland-protocols/blob/6be6e00c0294e075e7b689167e08b52bf55ffccb/unstable/xdg-shell/xdg-shell-unstable-v6.xml#L582

The reason for asking this information, is related to multiple different use case.

a. Be able to apply some default setup when you launch a new application. For example, when you launch terminal, keep language always default to English by default. Launch some other app, use the last input method engine when it was closed.

b. Some user prefer to use same setup for one application. They just simply don't like the functionality provided in (1) because they find it too random about the group being done there.

c. Workaround on some app that assign context badly.

d. Other features: For example, a when Tex Editor kile is being typed, optionally show up some tex autocompletion hint by input method. Though, you may argue this work is belongs to tex editor, but Fcitx does have latex based typing booster. And possiblly it's more customizable.

To me, (a) is more important and (b)(c)(d) is not the top requirements.

wengxt commented 4 years ago

Also (1) applies to following use case

Different application use different input method engine (terminal vs chat)
Same application use different input method engine (chat with different people)
Same window with different input method for different text box: Filling a form with name, address, phone number, credit card number. First name, last name and address text box should switch input method together, phone number and credit card number box should use different im. Though, phone number / credit card number should not belong to the same input context, it might be hard for an application to actually implement it correctly, especially for browser case. That's where hint and purpose kicks in to temporarily make input method only emits number and letter.

dcz-self commented 4 years ago

Thinking about this post and how the protocols situation is a bit complicated, I wrote a blog post that clarifies it. Please give it a read: https://dcz_self.gitlab.io/posts/input_method/

I tried to think about the issue from different angles, and I found one approach that I can sort of carry to its conclusion. The initial observation is: it's hard to agree how big scopes should be. Some use cases seem to prefer per-text field settings to be saved (and shared), some want applications, some want windows, some want tabs. It's obvious that no matter which option the protocol picks, it's going to be wrong.

So why pick one at all? The choice of scope is a matter of policy, and there's no reason to embed policy in the protocol. Let policy be defined in an appropriate place: in the case of Wayland, this would be the compositor, which has the fullest picture of all the interactions in the system. The protocol then only needs to be robust enough not to drop data relevant for policy. So instead of:

Combined

(P is the Privileged input method) we can do something like:

bitmap

In the second picture, "?" is any policy source. It can be hardcoded, or it can be loaded from a config, or it can be fetched from fcitx or some other input method. The point is that it's not glued together with the input method protocols, saving us designers some headache, and giving compositors/input method authors some freedom, at the cost of having to add this policy stuff to each compositor. (Maybe policy choices can be standardized in the future, but I'm not trying to solve that.)

Maybe this is too abstract, so let's come up with a simple input-method update that would work with this.

The basic idea is that there are groups and text fields. Groups are defined by the application (a policy suggestion), and the final decision belongs to the compositor (the policy). So the application sends the compositor two pieces of ID: the (groupID, fieldID) tuple. The group could be a tab, for example, and field could be a field within the tab. The compositor then processes them, alters them if needed (for example it doesn't want grouping by tabs but by applications), and sends the new tuple (finalGroupID, finalFieldID) to the input method. The input method then can look up if it saw this group before, and apply the settings to the field (actually I'm not sure if we even need the finalFieldID here, if having shared settings is what we're after).

How does the compositor come up with the group ID? It calculates the new group ID using some mixing "+" function. If it wants to ignore application's suggestion in favor of grouping over applications:

finalGroupID = applicationID finalFieldID = (processID +) groupID + fieldID

If it wants to group all fields across all processes/windows/* within a desktop together:

finalGroupID = desktopID finalFieldID = (processID +) groupID + fieldID

This should give enough rope to implement whatever anyone wants without having to touch the protocol actually carrying text events.

That's it. The only policy decision remaining (partially) in the application is how long the IDs stay valid: as long as the process lives, as long as the config file lives, etc.

This is a simple solution, and should work for the purpose of identifying fields. Sending notifications to the input method I think doesn't belong here, because answering the question "when does a field become invalid?" is a policy decision again. Is it when it goes out of focus? When the window closes? When the client disconnects? When the process exits? When the program is uninstalled? This is a philosophical question, and unrelated to sending text around, so I'll leave it to input methods to deal with.

Initially I thought this would be giving all the control to the compositor, but the compositor can use whatever private communication channel to notify the input method about which field ID was the result of what decision. I would discourage that, but this may satisfy those who want the input method to apply special treatment.

Anyway, I think this is the most reasonable way to make progress on this while keeping the scope of text-input/input-method small.

wengxt commented 4 years ago

Before designing any new protocol or random idea, I think we should learn from history first:

XIM: https://www.x.org/releases/X11R7.6/doc/libX11/specs/XIM/xim.html#event_flow_control_2 XIM_SET_EVENT_MASK (IM Server -> IM library) 2 CARD16 input-method-ID 2 CARD16 input-context-ID Two level of abstraction: input-method-ID -> Client input-context-ID -> Field DestroyIC -> notify server of invalid Field CloseIM or some other X11 based notification -> notify application exits.

IBus/Fcitx's DBus API: Per dbus object represents a session. The application name part is more like a fcitx "policy" so we don't need to put it with in the scope. DestroyIC or DBus based mechanism to notify the application exits. https://github.com/fcitx/fcitx5-qt/blob/master/qt5/dbusaddons/interfaces/org.fcitx.Fcitx.InputContext1.xml

MacOS: https://developer.apple.com/documentation/inputmethodkit/imkinputcontroller The IMKServer class, which is allocated in the main function of an input method, creates an input controller object for each input session created by a client application. For every input session there is a corresponding IMKInputController object.

Which means a client application can have multiple "input session", and identified directly by the different Object in API.

The top issue of this v2 protocol (Actually I think V1 is fine, if the invalidation doesn't happen on focus), is not being able to track input context (or field, session) lifetime.

How to apply config to a newly created field, is the "policy" to me, and should be solely controlled by input method and no need to have compositor involved to this. This is irrelevant to tracking the lifetime tracking of the client.

Why tracking the lifetime is important, is because input method need to allocate resource to handle the input. For example, in order to handle compose, input method need to load the compose table and prepare a input buffer in memory to hold what key has been pressed. Not to mention the more complex case CJK input method. There will be more resource need to be maintained for each client. Notifying the life time of client,is not only about a user experience, but also about the memory management on the input method server side.

That is why, all of the existing non-wayland protocol all tolerates the crash of application. In wayland, the same thing should be reused: the life time of a wayland object that represent a input context/session. And there is no need to have a yet another id allocation mechanism.

And we don't need "group" in the protocol. The only meaningful group that can be done is on the application side (Application understand it's own logic, while compositor and input method has no idea about that). If application thinks it need to group anything, it should simply create multiple text input and use them separately.

To put it simple: I'd like to see a 1-to-1 mapping of text-input to a input-context object on input method server side, no need to have anything else like "policy" "group" that complicates the situation.

dcz-self commented 4 years ago

The way I see it, this idea I presented almost supports the 1:1 mapping you are after.

When a Wayland object is created, it's really just another numerical ID, so it's equivalent to sending an explicit "field" ID. The application is in control of the "field" ID it sends to the compositor, so this can be made to correspond to the session (if this means "multiple text fields", I'm still going to be upset).

The one difference, like you noticed, is that there are no separate events to communicate creation and destruction of the IDs/sessions/objects. As I mentioned before, I intentionally skipped them. Deciding the lifetime of an object in a protocol would impose the opinion of the protocol maker on to all other parts of the stack.

There are examples of different lifetimes. Traiditionally, on the desktop, an application lives as long as the process does. That gives a scope of the life of a process for the session. On mobile, however, the application lives for longer than the process. The process gets killed and resumed transparently to the user. I don't want to make a protocol for moving text around be responsible for making that kinds of decisions. When the paradigm changes, we would have to rework the protocol.

I agree that this makes it harder (not impossible; caching is used by web apps for example) to input methods to build and destroy resources in time, but I would rather put it in another, independent spec, and limit the scope of input-method to just basic throwing text around.

The "group" thing probably needs additional work, but it's better to focus on the viability of the basic idea first.