π© An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT-4 π€π¬ It also allows image generation πΌοΈ, image understanding π, speech-to-text conversion π€, and text-to-speech synthesis π
π¦ Download OpenAI Chat API Workflow (version 3.2.0
)
You can execute all the above features using:
The web UI is constructed by the workflow and runs locally on your Mac π» The API call is made directly between the workflow and OpenAI, ensuring your chat messages are not shared online with anyone other than OpenAI π Furthermore, OpenAI does not use the data from the API Platform for training π«
You can export the chat data to a simple JSON format external file π, and it is possible to continue the chat by importing it later π
brew install pandoc mpv sox jq duti
Setup Hotkeys
You can set up hotkeys in the settings screen of the workflow. To set up hotkeys, double-click on the light purple workflow elements.
Dependencies
To start using this workflow, you must set the environment variable apikey
, which you can get by creating a new OpenAI account. See also the Configuration section below.
You will also need to install the pandoc
and sox
programs. Pandoc will allow this workflow to convert the Markdown response from OpenAI to HTML and display the result in your default web browser with syntax highlighting enabled (especially useful when using this workflow to generate program code). Sox will allow you to record voice audio to convert to text using Whisper speech-to-text API.
To set up dependencies (pandoc
, mpv
, sox
, jq
, and duti
), first install homebrew. and run the following command.
brew install pandoc mpv sox jq duti
Change Log
Recent Change Log
o1-preview
, o1-mini
) supportedhere are three methods to run the workflow: 1) Using commands within the Alfred UI, 2) Passing selected text to the workflow, 3) Utilizing the Web UI. Additionally, thereβs a convenient method for making brief inquiries to GPT
Commands within the Alfred UI
You can enter a query text directly into Alfred textbox:
openai
) β space/tab β input query text β select a command (see below)OpenAI Query
)Passing Selected Text
You can select any text on your Mac and send it to the workflow:
OpenAI Query
Send selected text to OpenAI
Using Web Interface
You can open a web interface
openai-webui
)Open web interface
Using the Default Browser
If your default browser is set to one of the following and the duti command is installed on your system, the web interface will automatically open in your chosen browser. If not, Safari will be used as the default.
Restart OpenAI Workflow server by executing openai-restart-server
in case the web UI does not work as expected after changing the default browser.
Web UI Modes
Switch modes (light
/dark
/auto
) with Web UI Mode
selector in the settings.
Simple Direct Query/Chat
To quickly chat with GPT:
gpt
β space/tab β input query text (e.g. "gpt what is a large language model?")OpenAI Direct Query
With Direct Query
, the input text is sent directly to the OpenAI Chat API as a prompt. You can also create a query by prepending or appending text to the input text.
Direct Query
The input text is directly sent as a prompt to the OpenAI Chat API.
Prepend Text + Query
After the initial text is entered, the user is prompted for additional text. The additional text is added before the initial text, and the resulting text is used as the query.
Append Text + Query
After the initial text is entered, the user is prompted for additional text. The additional text is added after the initial text and the resulting text is used as the query.
Generate Image
The DALL-E API (dall-e-3
or dall-e-2
) is used to generate images according to the prompts entered. See Image Generation below.
Some of the examples shown on OpenAI's Examples page are incorporated into this Workflow as commands. Functions not prepared as commands can be realized by giving appropriate prompts to the above Basic Commands.
Write Program Code
GPT generates program code and example output according to the text entered. You can specify the purpose of the program, its function, the language and technology to be used, etc.
Example Input
Create a command line program that takes an English sentence and returns syntactically parsed output. Provide program code in Python and example usage.
Example Output
Ask in Your Language
You can ask questions in the language set to the variable first_language
.
Note: If the value of first_language
is not English
(e.g. Japanese
), the query may result in a more or less inaccurate response.
Translate L1 to L2
GPT translates text in the language specified in the variable first_language
to the language specified in the second_language
.
Translate L2 to L1
GPT translates text in the language specified in the variable second_language
to the language specified in the variable first_language
.
Grammar Correction
GPT corrects sentences that may contain grammatical errors. See OpenAI's description.
Brainstorm
GPT assists you in brainstorming innovative ideas based on any given text.
Create Study Notes
GPT provides study notes of a given topic. See OpenAI's description for this example.
Analogy Maker
GPT creates analogies. See OpenAI's description for this example.
Essay Outline
GPT generates an outline for a research topic. See OpenAI's description for this example.
TL;DR Summarization
GPT summarizes a given text. See OpenAI's description for this example.
Summarize for a 2nd Grader
GPT translates complex text into more straightforward concepts. See OpenAI's description for this example.
Keywords
GPT extracts keywords from a block of text. See OpenAI's description for this example.
The image generation can be executed through one of the above commands. It is also possible to use the web UI. By using the web UI, you can interactively change the prompt to get closer to the desired image.
When the image generation mode is set to dall-e-3
, the user's prompt is automatically expanded to a more detailed and specific prompt. You can also edit the expanded prompt and regenerate the image.
The image understanding can be executed through the openai-vision
command. It starts a capture mode and lets you specify a part of the screen to be analyzed. Alternatively, you can specify an image file (jpg, jpeg, png, gif) using "OpenAI Vision" file action. This mode needs gpt-4o
or gpt-4o-mini
model to be set in the workflow settings.
Most text-to-speech and speech-to-text features are available on the web UI. However, there are certain specific features that are provided as commands, such as audio file to text conversion and transcription with timestamps.
Text-to-Speech Synthesis
Text entered or response text from GPT can be read out in a natural voice using OpenAI's text-to-speech API.
Play TTS
button on the web UIOpenAI Text-to-Speech
Speech-to-Text Conversion
The Whisper API can convert speech into text in a variety of languages. Please refer to the Whisper API FAQ for available languages and other limitations.
Voice Input
button on the web UIopenai-speech
)Audio File to Text
You can select an audio file in mp3
, mp4
, flac
, webm
, wav
, or m4a
format (under 25MB) and send it to the workflow:
OpenAI Speech-to-Text
Record Voice Audio and Transcribe
You can record voice audio and send it to the Workflow for transcription using the Whisper API. The recording must be no longer than 30 minutes and will automatically stop after this time. Recording time is limited to 30 minutes and stops automatically after this limit.
openai-speech
) β Terminal window opens and recording startsChoose processes to apply to the recorded audio
You can choose the format of the transribed text from text
, srt
or vtt
in the workflow's settings. Below are examples in the text
and srt
formats:
Import/Export
You can export your chat data to a straightforward JSON format file, and resume your conversation later by importing it back in.
To export data, simply click on Show Entire Chat
in the chat window to navigate to the chat history page, then select Export Data
. To import data, just hit Import Data
on either the home page or the chat history page.
Monitor API Usage
To review your token usage for the current billing cycle on the OpenAI Usage Page, type the keyword openai-usage
. For more details on billing, visit OpenAI's Billing Overview.
You can set various parameters in the settings panel of this Workflow. Some of the parameters set here are used as default values but you can make temporary changes to the values on the web UI. You can also access the settings panel by clicking Open Config
from the web UI.
Required Settings
OpenAI API Key: Set your secret API key for OpenAI. Sign up for OpenAI and get your API key at https://platform.openai.com/account/api-keys/.
Base URL: The base URL of the OpenAI Chat API. (default: https://api.openai.com/v1
)
Web UI Parameters
localhost
or 127.0.0.1
can be used as the loopback address of the UI server. If the web UI does not work as expected, try the other. (default: 127.0.0.1
)enabled
)light
/dark
/auto
). (default: auto
)Chat Parameters
Model: OpenAI's chat model used for the workflow (default: gpt-4o-mini
). Here are some of the models currently available:
gpt-4o-mini
chatgpt-4o-latest
gpt-4o-2024-08-06
gpt-4o
You may or may not use the following beta models. System prompt and parameter settings are not available for these models. Also, streaming is not supported for these model and the response time is longer than the other models.
o1-preview
o1-mini
Max Tokens: Maximum number of tokens to be generated upon completion (default: 2048
). If this parameter is set to 0
, null
is sent to the API as the default value (the maximum number of tokens is not specified). See OpenAI's documentation.
Temperature: See OpenAI's documentation. (default: 0.3
)
Top P: See OpenAI's documentation. (default: 1.0
)
Frequency Penalty: See OpenAI's documentation. (default: 0.0
)
Presence Penalty: See OpenAI's documentation. (default: 0.0
)
Memory Span: Set the number of past utterances sent to the API as a context. Setting 4
to this parameter means 2 conversation turns (user β assistant β user β assistant) will be sent as the context for a new query. The larger the value, more tokens will be consumed. (default: 10
)
Max Characters: Maximum number of characters that can be included in a query (default: 50000
).
Timeout: The number of seconds (default: 10
) to wait before opening the socket and connecting to the API. If the connection fails, reconnection (up to 20 times) will be attempted after 1 second.
Add Emoji: If enabled, the response text from GPT will contain emoji characters appropriate for the content. This is realized by adding the following sentence at the end of the system content. (default: enabled
)
Add emojis that are appropriate to the content of the response.
System Content: Text to sent with every query sent to API as a general information about the specification of the chat. The default value is as follows:
You are a friendly but professional consultant who answers various questions, make decent suggestions, and give helpful advice in response to a prompt from the user. Your response must be consise, suggestive, and accurate.
Image Understading Parameters
512
to 2000
) of the large side of the image data sent to the image understanding API. Larger images will be resized accordingly. (Default: 512
)Image Generation Parameters
dall-e-3
and dall-e-2
are available. (default dall-e-3
)for dall-e-3
): Set the size of images to generate from 1024x1024
, 1024x1792
, 1792x1024
. (default: 1024x1024
)for dall-e-3
): Choose the quality of image from standard
and hd
. (default: standard
)for dall-e-3
): Choose the style of image from vivid
and natural
. (default: vivid
)dall-e-2
) : Set the number of images to generate in image generation mode from 1
to 10
. (default: 1
)dall-e-2
): Set the size of images to generate from 256x256
, 512x512
, 1024x1024
. (default: 256x256
)Text-to-Speech Parameters
tts-1
or tts-1-hd
. (default: tts-1
)alloy
, echo
, fable
, onyx
, nova
, and shimmer
. (default: alloy
)1.0
)disabled
)Speech-to-Text Parameters
Transcription Format: Set the format of the text transcribed from the microphone input or audio files from text
, srt
, or vtt
. (default: text
)
Processes after Recording Set the default choice of what processes follow after audio recording finishes (default: transcribe [+ delete recording]
).
Audio to English: When enabled, Whisper API will transcribe the input audio and output text translated into English. (default: disabled
)
Other Settings
English
)Japanese
)disabled
)not set
)Environment Variables
Environment variables can be accessed by clicking the [x]
button located at the top right of the workflow settings screen. Normally, there is no need to change the values of the environment variables.
http_keep_alive
: This workflow starts an HTTP server when the web UI is first displayed. After that, if the web UI is not used for the time (in seconds) set by this environment variable, the server will stop. (default: 7200
= 2 hours)http_port
: Specifies the port number for the web UI. (default: 80
)http_server_wait
: Specifies the wait time from when the HTTP server is started until the page is displayed in the browser. (default: 2.5
)websocket_port
: Specifies the port number for websocket communication used to display responses in streaming on the web UI. (default: 8080
)Yoichiro Hasebe (yohasebe@gmail.com)
The MIT License
The author assumes no responsibility for any potential damages arising from the use of this software.