microsoft / PowerToys

Windows system utilities to maximize productivity
MIT License
108.81k stars 6.42k forks source link

Customizable Context Menu for PrintScreen (Merge "Text Extractor" + "Snipping Tool" + "Translate Text" + "Ask AI About..." + Etc) #25197

Open mdrejhon opened 1 year ago

mdrejhon commented 1 year ago

Customizable Context Menu for PrintScreen

Audience: Mainstream AND Advanced

  1. mainstream use cases e.g. everyday preinstalled menu items such as "Text Extractor" and "Translate" and "Magnify" etc

  2. advanced/niche use cases e.g. advanced users optionally adding extra menu items to PrintScreen context menu via context menu editor

Superset of both "Text Extractor" / "Snipping Tool"

Possible mockups of PrintScreen context menu (or preferred screen capture hotkey) that appears immediately after selecting rectangle:

image or image

Proposed menu would be optionally customizable by advanced users. Hotkeys are displayed as part of the menu, as a kind of a cheat sheet, for users who want to skip the context menu next time.

Proposed menu customizability by advanced users can be initially via registry/configuration, and later via easy menu editor utility (in Settings).

Customizability covers special user-specific needs such as "Copy as Text (No Line Feeds)" vs "Copy as RTF" vs "Run LaTeX-OCR" vs "Translate to French", allowing advanced users to streamline their workflows, with the use of your desired third party image-processor etc.

NOTE: This could be either (A) a new PowerToy or (B) a modification to Text Extractor / Sniping Tool. I don't know if this is a "New PowerToy" or a "Modified PowerToy" idea, or a "New Parent PowerToy to control Similar Child PowerToys" system. But I think this is such a dramatic feature request that this deserves to be a "New PowerToy" which might be a fork of the Text Extractor codebase, but still chains to existing version of Text Extractor and Snipping Tool. Alternatively, if a modification of Text Extractor, it could be a optional context menu that is activated in the Text Extractor Settings.


Long Description

Copying any onscreen text

  1. Select rectangle
  2. Optional context menu pops up (with default "Copy as text..." already selected)
  3. Hit Enter (or click on "Copy as text...")

Translating any onscreen text

  1. Select rectangle
  2. Optional context menu pops up (including a "Translate to..." option)
  3. You choose "Translate to..." option
  4. An easy translator UI appears (or autolaunch of user-specified URL such as Google Translate) and the translated text appears in another pane (with one-button copy-to-clipboard)

Snipping Tool Integraton

  1. Select rectangle
  2. Optional context menu pops up
  3. Select "Copy as image."
  4. Upon selection, automatically launches Snipping Tool with image already copied

These would be the obvious common ones. Possibly more niche context menu items (that can be shown/hidden from menu) can be added later:

Casual screenreading

  1. Select rectangle
  2. Optional context menu pops up
  3. Select "Speak Text"
  4. This is useful for both assistive and non-assistive -- such as dyslexia and eye-off-screen situations, like playing a web HOWTO while trying to repair something (some people work better this way)

Artificial Intelligence Integration, once they accept images

  1. Select rectangle
  2. Optional context menu pops up
  3. Select "Ask Bing Chat About..." or "Ask ChatGPT About..."
  4. I would be asked to type/speak a query. Possible useful queries could be:
    • "...Where did this image originally come from?..."
    • "...Is this fact true?..."
    • "...How do I make this text bigger in the menus of this specific app?..."
    • "...What's the best way for me to test this shader example...?"
    • "...I don't understand this strange command line error, do you know why is this happening?..."
    • "...This popup error is new. Is there a security issue?..."
    • "...Explain this command line compiler error..."
    • "...Please change the color the background to purple, and put a teddy bear on top of this application window, and a Happy Birthday message on it, I want to surprise my kid with a fancied up version of this screenshot of this new game download..."

Keep in mind that AI capable of understanding screenshots, is already working in the laboratory:

image

One Main Hotkey for all rectangle selection features

I include Snipping Tool because I hate memorizing too many hotkeys for similar-function behaviors. Easier to have one main hotkey (e.g. the conveniently aptly named PrintScreen) for Snipping Tool, Translate, and Text Extractor. Could be plugin-API capable, in theory.

Avoids Disruption To Existing Users: Context menu can still have "hotkeys" as reference sheet

A context menu is an intuitive reference manual for helping remember additional hotkeys for more frequently used functions!

Here is a possible mock-up example popup context menus, with default item already selected (activates on hitting Enter):

image

image

Scenarios when this would be used?

There are tons of use cases, you can imagine -- but one use case I am missing is Translate. I list multiple scenarios below:

Instant Translation of Onscreen Text

AI translators are great nowadays. Digital nomadism have boomed. More WFH, more people working in different countries. I am a Canadian who also has a "Work from home" office from Mexico winter home. With that, comes increased demand for easier integrated translation that works with any apps that doesn't have translation.

In addition, I am a deaf person who use chatting more often than audio. I chat to many people in multiple languages.
In addition, I now remotely work for multiple clients who are in Taiwan, Korea, and various parts of Europe.

Consequently, I often have to use Text Extractor + Google Translate, in a somewhat cumbersome way.

iPadOS Already Has a defacto Translate Screenshot Feature

The nice iPadOS Live Text button appears when you screenshot, and upon selecting text, has a built-in Translate context menu. On my iPad, I can screenshot any screen, and it pops up the Live Text feature that has a Translate context menu!! I already use this Apple iOS feature all the time in my chatting apps, e.g. chatting to friends who write me in Spanish, etc.

(P.S. As a deaf person, it's very hard for me to learn new spoken languages, and using the Translate Screenshot feature in any translate-unsupported chat app, is highly self-educational in a sort of an accidental "immersion learning" feature)

Text Extractor is amazing!

It would be more amazing if I could instantly choose what I wanted to do, including Translate. I almost wonder, why isn't this a Windows PowerToy already? Wink, wink...

Currently, the onerous Windows workflow is I have to open Google Translate, paste into Google Translate, manipulate the website, then copy back into chat sometimes. It would be nice to be able to skip exiting the chat app, for both translating other people's texts and my own texts -- in all chat apps that didn't support translation. Whether it be a multi-country business text chat, or a personal WhatsApp Chat with a Mexican friend.

Consistent with Precedent of the old Right Click context menu

The famous right-click context menu. Cut, Copy, Paste, etc.

Except this PowerToy is a context menu specifically for doing something smart with a screen crop rectangle.

I suspect that this hotkey could in theory become so useful to power users, that some people may actually assign PrintScreen key to this. PrintScreen might be in theory become the universal "do something with this screenshotted rectangle" context menu of the future -- it's already part of Snipping Tool, so it's a natural hotkey.

This idea wasn't practical before, but it is today

This was not practical in the past until AI came by to do AI-based OCR (which Text Extractor uses), AI-based translate (now good enough for chat), etc.

So what wasn't a good idea in the past, is now a (possibly) extremely fantastic idea today. AI-OCR, AI-translate, AI-speech, Ask-AI, etc.

Some mainstream assistive / accessibility potential too

Rationale of why I added "Speak Text" and "Magnify Text" as possible everyday occasional-use features that many can appreciates existing

In gigantic numbers over 100x+ more common than fully blind, are people who struggle to read tiny text (e.g. grandma/grandpa, or when I grow a bit older). The kind of text that appears on the screen in some random app only once every few hours;

Assistive features that be added ("Speak Text", "Magnify") to the context menu for the partially vision impaired, such as older computer geeks who can't read tiny text but can easily aim a rectangle around them. Yes, we Gen-Xer programmer eyes are alas, aging -- and some situations can pop up where something nonzoomable on screen appears, that we wish we could read better. You know, that one time that tiny text appears in one of our apps, like a tiny settings screen of some fancy card game app we just installed that didn't respect the DPI zoom setting, and then...

Or maybe we're trying to troubleshoot a problem with some object in our hands, and want to just listen to a screen reader of a few paragraphs from a repair HOWTO. Even if we don't have dyslexia (though that helps; some people listen 5x faster than read text).

Not all accessibility features need to be blatantly accessibility features -- much like iPhone vibrate mode is a mainstreamed notification accessibility feature for the deaf -- it's an everyday thing now that doesn't need an accessibility logo. I

Yes -- it can still be called an accessibility feature, but it's an "full time accessibility feature that stays out of the way and doesn't interfere with users who don't need the accessibility feature" -- much like the vibrator mode of a modern smartphone, fantastically useful for the deaf but is not thought of as an accessibility feature anymore;

A full time screen reader is very annoying to mainstream users (creates annoying visible behaviors), so it's never used by many of us even though we occasionally sometimes wanted it. On the other hand, a part-time screen reader is just like a "DIY Audiobook" spontaneous convenience -- like using a phone vibrator instead of a phone ringer.

Rheoretically, one asks oneself; why pigeonhole all accessibility features to only accessibility? Certain useful features like a "Speak Text" that we only sometimes need, if we're just looking away from the monitor during some simple eyes-off-the-screen stuff like textbook study or object repair; etc? Many listen to music while studying textbooks, and love audiobooks, so screen-rectangleing and selecting "Speak Text" is an unobtrusive convenience feature for everyday users!

Or semi-accessibility needs. The times where 98% of time we are just about to read the screen without glasses, but the glasses are downstairs... Etc. We people don't always want a full time screen magnifier who we often forget the hotkey, or get annoyed by an accidental activation of an accessibility hotkey that doesn't have an obvious cancel feature (like a "Speak Text" window would have a very clear Cancel button). UX workflowing is much more mainstream-unobtrusive (while still providing accessibility) through my suggestion. Y'know, like a smartphone vibrate feature isn't an "accessibility-only feature" anymore these days. But want an easy-to-remember one hotkey that has all the important helper features. Especially our agin' computer geek brains, y'know;

Every need of each person is different; but a universal context menu for "do something about this rectangle" -- is probably pretty darn useful. Now you're getting the idea!

Potential / Suggested Method of Configurability: URI system

(including local URIs to installed apps, or URIs to websites).

This is just a suggestion, that could actually turn this into a very easy-to-create PowerToy that is highly configurable by advanced users who want to aberrate away from a standard/included context menu.

Although more integration is ideal, keeping this tool simple may require some thought on the correct kind of API to use in this situation. This could be a custom configurable context menu, with editable text + editable URI + editable menu hotkey (e.g. the underscored letter in a pulldown menu);

Defaults (Copy as text, Copy as image, Translate) can be preinstalled when users download and install this PowerToy. But would also be operated by a modifiable configuration (on disk, in registry, etc) that more advanced tweakers can do to add/remove context menu items and/or change the default selected context menu item (that executes on hitting Enter). A possible rudimentary "Configure Context Menu" at the bottom of the context menu (simple menu reorder / show menu item / hide menu item), since users may submit popular context menu items to be added to future versions of the powertoy but other users might want to hide from the context menu.

Anything that does not support URI, some geek can create the appropriate "glue" app as needed, and provide the URI to the app.

The configurability should in theory be flexible; e.g. a menu item that just copy image to clipboard immediately, or wanting to launch directly to Snipping Tool (with the image already showing)

For the URI (either website or command line), it could include template inserts for arguments for both text and images. The path to the screen crop, or the text of the OCR'd text. Such as {$text} and {$imagepath}. Preferably both made available to context menu editors, to let us brainstorm how to create our dream context menu. Some websites might need the image as a POST URL, but I'm not sure the best way to provide such configurability, so let's start simple with just an image path, and let app authors create the necessary glue apps to POST the image to URL (e.g. an AI that understands image+text, for the particular use case of asking an AI about the content of the screenshot crop).

For those URIs using {$imagepath} ... When launching a local app or website that needs the image, the crop would autosave instantly to disk upon selection of screen crop image -- for the associated app/website of the selected context menu item.

mdrejhon commented 1 year ago

Mea culpa

I apologize for the repeat-editing of the title. I momentarily forgot that created multiple entries cluttering the history above.

Nontheless

OS Feature of the future?

Because of the gigantic explosion of AI and the exploding number of possible use cases... ...I suspect this will (longterm) become a shoo-in OS feature, much like how Snipping Tool was a PowerToy in 2007, and now is part of Windows. Where else but to incubate this as a PowerToy first?

Since it's hard to remember all those hotkeys for screen rectangles -- the popup context menu will be an easy cheat sheet of all the various hotkeys (for Text Extractor and for Snipping Tool -- to bypass the context menu for most commonly used screen-rectangle actions.

mdrejhon commented 1 year ago

UPDATE:

I've edited this github to insert possible mock-ups:

image

Another possible mockup, slightly bigger for readability, extended with more "useful options of the future"

image

The first item is selected by default, which executes upon hitting Enter.

A default menu would be included when installing this proposed PowerToy. Menu would be editable by advanced users via registry or configuration file, to hide/show options, reorder options, or add new items. Then later, a developer may create a menu editor utility which can become part of the Settings feature of this proposed new PowerToy.

There would be a main keypress. Although I would assign PrintScreen to the rectangle selector system, the main key could perhaps also be reassignable like any of the other PowerToys.

I will occasionally visit this item -- I have been crossposting in (somewhat) related-user-need items, to help raise awareness of a feature that users may become excited about in the future;

Long Term Planning (~1-to-5 years): Future semi-automatic rectangle highlight (by AI)

I only mention this because this PrintScreen Context Menu feature is very naturally extendable to automatic annoyance-free AI-based snipping, should the user choose to enable this feature.

Point approximately at what you want to do (pointing at middle of a chat bubble, or a window title bar, or a paragraph in any app, a dialog box, etc). It gets automatically rectangle-highlighted. You'd still be able to override it with manual rectangle (current Snipping Tool workflow).

AI-based rectangle autoselect would be fainter highlight, and only upon hover, only if enabled by the user, and always overrideable by manual rectangle-select, in a fully seamless manner.

This feature means upon pressing PrintScreen, the user just mouses around and let the AI automatically faintly glow a rectangle around what it thinks the user is pointing at, e.g. paragraph of text, a window, a chat screen, a dialog box, etc).

As AIs become smart enough to accurately automatically do the rectangle for you correctly most of the time, the rectangle-draw may become usually unnecessary. User will still be able to draw rectangle manually simply by dragging a rectangle instead of a stationary click to commit AI-guessed rectangle -- as a possible UX route between AI-autoselect-rectangle versus user-manually-select rectangle.

Hover-based detection only, to avoid distracting autoselect behavior: Faint guessed-rectangle highlights will only occur only during mouse hover. to prevent distracting highlights. The highlight will animate to the new highlight once the user stops moving the mouse and hovers stationary over the proposed target.

If users keeps moving mouse (beyond the currently configured OS-based "hover" detection), no distracting AI-based automatic rectangle-suggestion will activate; this will be a good balance between automatic rectangle-suggestion and manual rectangle select in a UX point of view, avoiding potential continuous thrashing of random annoying guessed-highlights;

For example, things like hovering a mouse stationary on a paragraph of text in Word or Chat will cause the rectangle autoselect on only that paragraph. A single stationary mouse click would commit the rectangle selection and popup the context menu. In other use cases where user actually wants to manually draw rectangle, just drag a rectangle normally as usual (like today's Text Extractor or Snipping Tool).

This minimize annoyanes, while keeping automatic/manual completely seamless.

Current AI systems of 2023 should now have the capability to be trained to be surprisingly reliable for mudane hover things like pointing at a chat bubble, pointing at an image, or pointing at a windows title bar, or a compiler line, etc. You've seen AIs that correctly detect outlines of objects in photographs (including the one built into iOS 16+ where you can easily drag a dog out of a photo in your photo roll), the same training concept can be done to automatically detect context.

Example Scenario: Streamlined automatic-rectangle-select workflow example using AI-based auto-highlighter suggestion:

  1. Just press PrintScreen; (still behaves like today's Snipping Tool, screen dims and mouse becomes a "+")
  2. Hover mouse over your foreign-language chat bubble (until AI-based auto-highlighter fade-highlights it, possibly a 0.5 second to 1.0 second hover to trigger)
  3. Click once (context menu pops up)
  4. Select "Translate Text"

Voila. Very few steps, no mouse dragging. Works in any chat app, the app doesn't even need to be designed to know how to translate text.

If you don't like the auto-highlight that appears, just resume moving the mouse around to a new target and/or manually drag a rectangle as normal. Very unobtrusive, since manual workflow is permanently enabled at all times in total familiarty; it's just an optional AI-based hover autohighlighter;

For simplicity, ignore this idea for now -- this is a napkin exercise. I simply mention this, simply to show how future-extendable this feature can be; going beyond the manual-crop-rectangle era.

Security Analysis & Improvement

(I add this section to save time by Microsoft employees mulling the implications)

Policy Editor setting can be the conduit to prevent customizability of menu, to prevent information leakage (e.g. menu additions that might transmit the crop rectangle to an Internet site for processing.

e.g. A theoretical "Share to Imgur" feature and its implications.

For example, many tech writers are constantly screenshotting rectangles and sometimes would love to immediately share directly to their image host (or CMS such as Wordpress site using a theoretical future third party plugin), for publishing things like software reviews and creating online content. Keeping such users on Windows instead of MacOS, would be enhanced by utilities such as these. On the other hand, it is a potential accidental leak of proprietary information for a semiconductor fab and such a workplace would have policy editor restricting PrintScreen context menu customizability.

And you don't want the menu to have any possible GDPR-infringing options by default yet (e.g. some countries may not allow "Ask Bing Chat AI" to be added, so that's another argument for menu customizability -- so a menu customizability mechanism and appropriate interlocks (e.g. Policy Editor) satisfies needs in this era;

During the proposed feature incubation period -- the simple logical route for Microsoft would be to release only the basic menu initially, but allow menu customizability (unless prevented by a Policy Editor setting).

Also, the fact many corporate systems often do not allow you to install third party software without sysadmin help (including PowerToys), so there is an additional security moat there, too.

mdrejhon commented 1 year ago

I have purposefully commented on related use cases.

I wanted to demonstrate how gigantically broad an audience that this proposed "PrintScreen context menu" PowerToy could satisfy. There are over 50+ github items that could be satisfied with this PowerToy suggestion -- but I surgically chose only half a dozen.

Both mainstream audience and niche use cases are easily satisfied;

mdrejhon commented 1 year ago

(Cc: @Waltonvcl & @simvig01 who reacted positively to this idea)

NOTE: This could be either (A) a new PowerToy or (B) a modification to Text Extractor / Sniping Tool. I don't know if this is a "New PowerToy" or a "Modified PowerToy" idea, or a "New Parent PowerToy to control Similar Child PowerToys" system. But I think this is such a dramatic feature request that this deserves to be a "New PowerToy" which might be a fork of the Text Extractor codebase, but still chains to existing version of Text Extractor and Snipping Tool. Alternatively, if a modification of Text Extractor, it could be a optional context menu that is activated in the Text Extractor Settings.

Thoughts are welcome on approach:

1. Modification of an existing PowerToy (e.g. Text Extractor)

Pros: Fewer changes, reduces code duplication. A single code commit to Text Extractor to start a domino effect.

OR

2. New Parent PowerToy to control Similar Child PowerToys

Pros: Keep Snipping Tool and Text Extractor mostly unchanged. Minimum code change to other tools, some hook to be callable by a 'parent' PrintScreen PowerToy


My commentary:

The rectangle select could be a unified code (shared between Snipping Tool and Text Extractor), but the other utilities could still co-exist as standalone with their own hotkeys, activated independently.

Also other paradigms (other than a popup context menu) is welcome. As long as PrintScreen or a universal hotkey can provides an equivalent of menu/buffet/dialog/list of options, in an unobtrusive manner. The context menu format can be improved to other paradigms for instantly processing a screen crop.*


Many Avenues of UX Simplification For Rectangle Select Possible

Many opportunities for simplification. UX tweaks can be made to make it even more universal;

Stationary click (during hover) commits the rectangle-autosuggest.
Existing dragging behavior will override the rectangle-autosuggest, and behave normally (exactly like existing Snipping Tool or Text Extractor). Rectangle autosuggest could be a discreet faint rectangle that's not distracting, only when mouse moving.

Why Do This? Reduce Code and Feature Duplication

That way, you've got global screen select, single-window select, and custom rectangle select -- for ALL menu items (including not-yet-invented context menu plugins)

All menu items in the PrintScreen Context Menu would be able to support global screen, single window, and custom rectangle select. Implemented only once at the root level, rather than the per-utility level.

The legacy Snipping Tool can continue to support this at the per-utility level, but the universal rectangle selector would now gain all capabilities for all context menu items.

Brainstorming Other UX Simplifications

There will need to be thought of streamlining workflows as much as possible (keeping things easy for novice users, while staying powerful for advanced users).

We need to keep it intuitive for novice and advanced users. Many ways to incubate workflows or UX simplifications for a PrintScreen Context Menu PowerToy over the long term -- to see what users would prefer.

Metal-Frog commented 1 year ago

I would also appreciate having an easy to call translator. So: +1

mdrejhon commented 1 year ago

Many use cases covered

Other peoples' feature requests and enhancement requests at #25262, #7670, #23452, #24939, #25092, #25040, #21455, #24619, #25411, #25436, #25515, #15405, #25720, #25685, #25669, #24716, #18014 are automatically accomodated by this one PowerToy.

Check the comments I made to these items above.

Once you read this; one can realize how universal an Optionally-Customizable PrintScreen Context Menu PowerToy could be for both novice users, enthusiast, and advanced users. Mainstream menu items would come preinstalled, while advanced users can add plugins as custom menu items.

@crutkas -- Since this is either a New PowerToy or an Enhancement (e.g. to Snipping Tool / Text Extractor) -- I think this should receive both tags "Enhancement" and "New PowerToy" because it could go either way as explained.

mdrejhon commented 1 year ago

Since some are confused about the dual-use nature (mainstream vs niche) of this PowerToy, I wanted to clarify:

CLARIFICATION: This is intended to be a mainstream PowerToy with optional advanced-user features.

FYI -- to prevent confusion; there are mainstream uses of this PowerToy -- and niche uses of this PowerToy

  1. mainstream use cases e.g. everyday preinstalled menu items such as "Text Extractor" and "Translate" and "Magnify" etc

  2. advanced/niche use cases e.g. advanced users adding extra menu items to PrintScreen context menu via context menu editor

Basically, the PowerToy installer for the proposed Context Menu PowerToy will not install anything niche (2), it would be the advanced users' responsibility to add their favourite third party utility (command line configuration) to the context menu, such as say, via a menu editor (or editing a configuration file / registry).

Thus, no PowerToy developer (nor Microsoft) would need to implement the suggestion here since it can simply be manually added as a custom user-defined menu item to a different PowerToy workfow.

I just wanted to be clear that by default, only mainstream menu items will be installed by default -- turning this into a mainstream PowerToy that becomes a mainstream OS feature of the future ... It is just simply that advanced users can add their own favourite niche menu items.

In other words, (1) there are a few commonly used mainstream items (handling the everyday market)... and (2) an infinite number of possible unique niche items (handling the giant number of one-off niche cases by advanced users).

crutkas commented 1 year ago

@mdrejhon I see the issue, please stop spamming other issues. this causes confusion

mdrejhon commented 1 year ago

[email crosspost now self-removed, now it has received acknowledgement]

crutkas commented 1 year ago

Hi @mdrejhon

I do my best to be transparent here and I know the posts were done with good intentions and not malicious. As a maintainer, part of my job is to try and keep stuff tightly on-topic. Cross linking items is great, when appropriate. One way you may have accomplished this was from your main thread is to xref into the other threads. This will reduce a lot of the unintentional notifications.

I’ll go back and shift them to off-topic in the near future.

Thanks for your contributions and happy to go into more detail if you'd like

mdrejhon commented 1 year ago

Hi @mdrejhon

EDIT --

@crutkas I have deleted some of the unnecessary redundancy from the other issues, and edited/reposted others in a more discreet manner (to keep intact existing collaborations/reactions). I have not rewritten this item yet (editing and rewriting), but will do so eventually.

From now on, I will remember that entering the hash-number in this thread, automatically pings the other item with a backlink. Sometimes that's all I need to do. I will be more judicious from now on, from doing direct replies (possibly more customized/unique) for really-relevant, to just a ping (backlink) for the more superficially relevant.