Open grabani opened 2 months ago
@grabani The issue you have is you are discarding the previous text.
In my version, text
was the current line spoken.
e.g. hello my name is Dan
By enabling partial recognition, you will get the line as it currently is.
e.g.
This gives the illusion that it is adding an extra word each time, but in reality its clearing it for it call. You can see this if you partial revert back to my code.
.onAppear {
self.recorder.onRecognisedText = { [self] text in
if text.isEmpty {
return
}
// Only store and display the final version of the recognized text
// By replacing the previous content with the new one
consoleText = consoleText + "\n" + text
UserDefaults.standard.set(consoleText, forKey: userDefaultsKey)
}
I would suggest keeping track of whether the text is final or not and appending it to the previous final text.
if let result = result {
self.onRecognisedText?(result.bestTranscription.formattedString, result.isFinal)
print("Recognition: \(result.bestTranscription.formattedString)")
} else if let error = error {
Hi Daniel,
Thank you for taking the time to respond. I was unable to resolve my issue. If you could kindly continu eto support me with this issue I would be grateful.
Based on your feedback I updated the Recorder.swift
file with the following:
if let result = result {
// Pass both the recognized text and the isFinal flag to the closure
self.onRecognisedText?(result.bestTranscription.formattedString, result.isFinal)
print("Recognition: \(result.bestTranscription.formattedString), Final: \(result.isFinal)")
} else if let error = error {
// Handle any errors here
print("Error during recognition: \(error.localizedDescription)")
}
However, as before, after an app launch and click on the "Start "listening" transcribed text appears on the screen in real-time (due to recognitionRequest.shouldReportPartialResults = true
being enabled). If I then pause for around 2 seconds and then
continue to talk the TextEditor
view is cleared and the newly spoken words start to be transcribed on the screen.
Please find below my latest ContentView.swift
and Recorder.swift
code:
ContentView.swift
import SwiftUI
struct ContentView: View {
@ObservedObject private var recorder = Recorder()
@ObservedObject private var speechManager = SpeechManager()
@State private var consoleText: String = "" // Reintroduce the consoleText state variable
@State private var circleColor: Color = .black
@Environment(\.colorScheme) var colorScheme
private let userDefaultsKey = "RecognizedText"
init() {
// Clear the RecognizedText key to ensure it's empty when the app starts
UserDefaults.standard.removeObject(forKey: userDefaultsKey)
if let savedText = UserDefaults.standard.string(forKey: userDefaultsKey) {
print("UserDefaults content for key '\(userDefaultsKey)': \(savedText)")
} else {
print("No content found in UserDefaults at INIT for key '\(userDefaultsKey)'.")
}
// Initialize consoleText with the empty state or stored value from UserDefaults
_consoleText = State(initialValue: UserDefaults.standard.string(forKey: userDefaultsKey) ?? "")
}
func logUserDefaultsContents(){
if let savedText = UserDefaults.standard.string(forKey: userDefaultsKey) {
print("UserDefaults content for key '\(userDefaultsKey)': \(savedText)")
} else {
print("No content found in UserDefaults for key '\(userDefaultsKey)'.")
}
}
var body: some View {
GeometryReader { geometry in
VStack {
ScrollView {
TextEditor(text: $consoleText)
.font(.system(.body, design: .monospaced))
.padding()
.background(Color(UIColor.systemBackground))
.cornerRadius(8)
.overlay(
RoundedRectangle(cornerRadius: 8)
.stroke(Color.gray, lineWidth: 1)
)
.frame(minHeight: 200, maxHeight: .infinity)
.frame(width: geometry.size.width * 0.9) // Set width to 90% of the available width
.padding(.leading, geometry.size.width * 0.05) // Indent from the left side
}
.frame(height: geometry.size.height * 0.7)
HStack {
Circle()
.fill(circleColor)
.frame(width: 10, height: 10)
.padding()
Button(action: {
if (!recorder.isRecording) {
circleColor = .red
recorder.startRecording()
self.recorder.setRecord() // Updated to call setRecord()
} else {
circleColor = .black
recorder.stopRecording()
// No need to call setPlayback() if we're just stopping
}
}) {
Text(!recorder.isRecording ? "Start Listening" : "Stop Listening")
.foregroundColor(colorScheme == .light ? Color.white : Color.black)
.padding()
.background(
(recorder.hasMicrophoneAccess && recorder.isSpeechRecognizerAvailable) ?
Color.primary :
Color.gray.opacity(0.6)
)
.overlay(
RoundedRectangle(cornerRadius: 8)
.stroke(colorScheme == .dark ? Color.white.opacity(0.2) : Color.black.opacity(0.2), lineWidth: 1)
)
.cornerRadius(10)
}
.contentShape(Rectangle())
.disabled(
!recorder.hasMicrophoneAccess
|| !recorder.isSpeechRecognizerAvailable
)
Button(action: {
// Clear UserDefaults when clearing the session
let clearText = "Session started \(formattedDate())\n\n"
UserDefaults.standard.set(clearText, forKey: userDefaultsKey)
consoleText = clearText
}) {
Text("Clear")
.foregroundColor(colorScheme == .light ? Color.white : Color.black)
.padding()
.background(Color.primary)
.overlay(
RoundedRectangle(cornerRadius: 8)
.stroke(colorScheme == .dark ? Color.white.opacity(0.2) : Color.black.opacity(0.2), lineWidth: 1)
)
.cornerRadius(10)
}
.contentShape(Rectangle())
}
.frame(width: geometry.size.width * 0.9) // Set width to 90% of the screen width
Spacer()
}
.padding(.top, 75)
}
.onAppear {
if let savedText = UserDefaults.standard.string(forKey: userDefaultsKey) {
print("UserDefaults content at onAppear for key '\(userDefaultsKey)': \(savedText)")
} else {
print("No content found in UserDefaults at onAppear for key '\(userDefaultsKey)'.")
}
// Update `onRecognisedText` to handle both the recognized text and the isFinal flag
self.recorder.onRecognisedText = { [self] text, isFinal in
print("DEBUG:Recognized text received: '\(text)', Final: \(isFinal)") // Log the recognized text
if text.isEmpty {
return
}
if isFinal {
// For final results, append them with a newline (or other separator)
consoleText += "\n" + text
print("Used isFinal")
} else {
// For partial results, replace the current line in `consoleText`
// This version appends the text in progress (partial result)
consoleText = text
print("Used Console")
}
// Update UserDefaults with the latest consoleText
UserDefaults.standard.set(consoleText, forKey: userDefaultsKey)
// Print UserDefaults content
if let savedText = UserDefaults.standard.string(forKey: userDefaultsKey) {
print("UserDefaults content for key '\(userDefaultsKey)': \(savedText)")
} else {
print("No content found in UserDefaults for key '\(userDefaultsKey)'.")
}
}
self.speechManager.onFinishSpeaking = {
self.recorder.setRecord() // Updated to call setRecord()
}
self.recorder.requestPermission()
}
.alert(isPresented: $recorder.showAlert) {
Alert(title: Text(recorder.alertTitle), message: Text(recorder.alertMessage), dismissButton: .default(Text("OK")))
}
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView()
}
}
func formattedDate() -> String {
let formatter = DateFormatter()
formatter.dateStyle = .medium
formatter.timeStyle = .short
return formatter.string(from: Date())
}
Recorder.swift
//
// Recorder.swift
// ChattyMarv
//
// Created by Daniel Platt on 16/09/2023.
//
import SwiftUI
import AVFoundation
import Speech
class Recorder: ObservableObject {
@Published var showAlert = false
@Published var alertTitle = ""
@Published var alertMessage = ""
@Published var isRecording: Bool = false
@Published var hasMicrophoneAccess: Bool = false
@Published var alert: Alert?
private var speechRecognizer = SFSpeechRecognizer()
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
private let audioEngine = AVAudioEngine()
private let audioSession = AVAudioSession.sharedInstance()
var onRecognisedText: ((String, Bool) -> Void)? // Updated to include the isFinal flag
var onRecognisedSound: (() -> Void)?
init() {
// Configure the audio session
print("recorder")
//Ghulam:self.setRecord()
startRecording()
}
func requestPermission() {
if #available(iOS 17.0, *) {
AVAudioApplication.requestRecordPermission { (hasPermission) in
DispatchQueue.main.async {
self.hasMicrophoneAccess = hasPermission
if !self.isSpeechRecognizerAvailable {
self.alert = Alert(title: Text("Speech Recognition Unavailable"),
message: Text("Please try again later."),
dismissButton: .default(Text("OK")))
}
}
}
} else {
audioSession.requestRecordPermission { (hasPermission) in
DispatchQueue.main.async {
self.hasMicrophoneAccess = hasPermission
if !self.isSpeechRecognizerAvailable {
self.alert = Alert(title: Text("Speech Recognition Unavailable"),
message: Text("Please try again later."),
dismissButton: .default(Text("OK")))
}
}
}
}
SFSpeechRecognizer.requestAuthorization { authStatus in
OperationQueue.main.addOperation {
switch authStatus {
case .denied:
self.updateAlert(title: "Access Denied", message: "User denied access to speech recognition")
case .restricted:
self.updateAlert(title: "Access Restricted", message: "Speech recognition restricted on this device")
case .notDetermined:
self.updateAlert(title: "Authorization Needed", message: "Speech recognition not yet authorized")
default:
break
}
}
}
}
func setRecord() {
print("Entered the setRecord Function")
do {
try self.audioSession.setCategory(.record, mode: .default, options: [])
} catch {
print("Failed to set audio session category: \(error)")
}
}
var isSpeechRecognizerAvailable: Bool {
return speechRecognizer?.isAvailable ?? false
}
func startRecording() {
// Request microphone access
if (!self.hasMicrophoneAccess) {
print("Microphone access denied")
return
}
print("Start recording")
do {
self.setRecord() // Use the record category instead
try self.audioSession.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
print("Failed to set audio session category: \(error)")
}
print("Start recording - reset")
// Reset the audio engine and the recognition task
DispatchQueue.main.async {
self.audioEngine.stop()
self.recognitionTask?.cancel()
// Change the UI state
self.isRecording = true
self.recognitionTask = nil
self.recognitionRequest = nil
// Create and configure the recognition request
self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = self.recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
if self.speechRecognizer?.supportsOnDeviceRecognition == true {
// Set requiresOnDeviceRecognition to true to enforce on-device recognition
recognitionRequest.requiresOnDeviceRecognition = true
} else {
// Handle the case where on-device recognition is not supported
print("On-device recognition not supported for the current language or device configuration.")
}
// Install the tap on the audio engine's input node
print("Install the tap on the audio engine's input node")
let recordingFormat = self.audioEngine.inputNode.outputFormat(forBus: 0)
self.audioEngine.inputNode.installTap(onBus: 0, bufferSize: 4096, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
DispatchQueue.main.async {
if !self.isRecording {
return
}
self.recognitionRequest?.append(buffer)
}
}
// Start the audio engine
do {
try self.audioEngine.start()
} catch {
print("There was a problem starting the audio engine.")
}
// Start the recognition task
self.recognitionTask = self.speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
DispatchQueue.main.async {
if (!self.isRecording) {
return
}
if let result = result {
// Pass both the recognized text and the isFinal flag to the closure
self.onRecognisedText?(result.bestTranscription.formattedString, result.isFinal)
print("Recognition: \(result.bestTranscription.formattedString), Final: \(result.isFinal)")
} else if let error = error {
// Handle any errors here
print("Error during recognition: \(error.localizedDescription)")
}
}
})
}
}
func stopRecording() {
print("Stop recording")
if !self.isRecording {
return
}
DispatchQueue.main.async {
self.recognitionTask?.cancel()
self.recognitionRequest?.endAudio()
self.audioEngine.inputNode.removeTap(onBus: 0)
self.audioEngine.stop()
do {
try self.audioSession.setActive(false)
} catch {
print("There was a problem stopping the audio engine.")
}
// Reset recognition-related properties
self.recognitionRequest = nil
self.recognitionTask = nil
self.isRecording = false
}
}
private func updateAlert(title: String, message: String) {
self.showAlert = true
self.alertTitle = title
self.alertMessage = message
}
}
Please also find below the output of my console log
:
Speech Recognition(25104,0x1f04dfec0) malloc: Unable to set up reclaim buffer (46) - disabling large cache
recorder
Microphone access denied
No content found in UserDefaults at INIT for key 'RecognizedText'.
No content found in UserDefaults at onAppear for key 'RecognizedText'.
#FactoryInstall Unable to query results, error: 5
Unable to list voice folder
Unable to list voice folder
Unable to list voice folder
Start recording
Entered the setRecord Function
Unable to list voice folder
Unable to list voice folder
Start recording - reset
Entered the setRecord Function
Install the tap on the audio engine's input node
DEBUG:Recognized text received: 'I', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I
Recognition: I, Final: false
DEBUG:Recognized text received: 'I will', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I will
Recognition: I will, Final: false
DEBUG:Recognized text received: 'I will talk', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I will talk
Recognition: I will talk, Final: false
DEBUG:Recognized text received: 'I will talk now', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I will talk now
Recognition: I will talk now, Final: false
DEBUG:Recognized text received: 'I will talk now pause', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I will talk now pause
Recognition: I will talk now pause, Final: false
DEBUG:Recognized text received: 'I will talk now pause', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I will talk now pause
Recognition: I will talk now pause, Final: false
DEBUG:Recognized text received: 'I', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I
Recognition: I, Final: false
DEBUG:Recognized text received: 'I am', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I am
Recognition: I am, Final: false
DEBUG:Recognized text received: 'I am now', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I am now
Recognition: I am now, Final: false
DEBUG:Recognized text received: 'I am now continu', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I am now continu
Recognition: I am now continu, Final: false
DEBUG:Recognized text received: 'I am now continuing', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I am now continuing
Recognition: I am now continuing, Final: false
DEBUG:Recognized text received: 'I am now continuing my', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I am now continuing my
Recognition: I am now continuing my, Final: false
DEBUG:Recognized text received: 'I am now continuing my new', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I am now continuing my new
Recognition: I am now continuing my new, Final: false
DEBUG:Recognized text received: 'I am now continuing my new', Final: false
Used Console
UserDefaults content for key 'RecognizedText': I am now continuing my new
Recognition: I am now continuing my new, Final: false
Stop recording
Hi Daniel,
I really hope you can spare some time to help me. I have spent an embarrassingly large amount of time trying to hack your code so that it will work with the
TextEditor
view.I am able to successfully get spoken words displayed as text in the
TextEditor
view. I managed this even after enabling therecognitionRequest.shouldReportPartialResults = true
in therecorder.swift
file so that text appears in near real-time.The issue I have is that, when a I pause from speaking (for a couple of seconds or so) the
TextEditor
window appears to clear all of its contents, and when I resume talking theTextEditor
view starts to display my speech text again. I have tried many code permutations to get it to work but have failed miserably.Can you please help.
My current
ContentView.swift
code is: