Open yashugupta786 opened 3 years ago
Can you share some logs with 3.tcp.ngrok.io? Also remind the bot is not designed to use ngrok, ngrok is only a workaround to develop a Bot. I would say you should rather contact ngrok with this problem.
And if you creating a cname to 3.tcp.ngrok.io in your own dns and create the certificate for your own dns entry so it will be resolved to the same origin but the certificate is valid for your own domain?
Sorry I didn't understand what you are talking about .You can have see the first screenshot for the ngrok maping@1fabi0
Yes that looks fine but if you create a cname on your own domain let's say 3.bot.dev.domain.com that points to 3.tcp.ngrok.io and create a certificate for 3.bot.dev.domain.com and use it instead of directly ngrok
What i have to do for that .Is there any documentation help like i am very new to this bot recording and badly stuck in this ...@1fabi0
The getting started Guide of the HueBot explains this way with CNAME-Entries for local development because as far as I know you can't create not self signed certificates for ngrok
I got that thing , but if I want to use it on dev environment why the address starts with 1.tcp.ngrok.io works and why the address start with 3.tcp.ngrok.io does not works .Bot is not joining the meeting when I am using 3.tcp.ngrok configuration for the US region .I am really stuck in this
@1fabi0 @JasonTheDeveloper
@yashugupta786 I am aware of someone who had the same issue as you and was able to generate a certificate using Let's Encrypt with a reserved domain and TCP port in a region other than US. I haven't personally tried to myself but it should be possible as long as both your reserved domain and reserved TCP port are in the same region.
Have a look at https://github.com/jakkaj/sslngrokdevcontiner and have a play around with this line in host.sh to generate a certificate. You'll need to update the -host-header
URL to match the region your domain is reserved in. Also make sure you run the script from inside the devcontainer for best results.
Note: the .devcontainer
is already defined for you. All you need to do is launch the project in Visual Studio Code with the Remote - Containers extension installed and docker running.
@zihzhan @omkarchoudhary123 This is the reply we got from @JasonTheDeveloper for generating certificate in different region . @JasonTheDeveloper Thanks we will try and will get back to you
Hi @JasonTheDeveloper @1fabi0 We were able to resolve the issue and We generate the certificate for the India Region using certbot and Now bot is joining the meeting . But now we are stuck in capturing or recording the audio .Can you help us like in which variable or the location we will get the audio saved from the unmixed audio buffer . as from the documentation we are not able to get it as it is very limited on capturing the audio Thanks in advance
To capture audio you will have to use the ICall
object it provides a method called. GetLocalMediaSession()
that will Provide a AudioSocket
where you can add a Event Handler to the AudioMediaReceived
event then your event handler is called 50 times a second and will provide 20ms of audio data for that you have the AudioMediaReceivedEventArgs
which has a AudioMediaBuffer Buffer
parameter in there you have a array UnmixedAudioBuffer
(that can be null) that contains unmixed audio data Buffers or you can use Data
Parameter which is a IntPtr
to the Data also Length
provides the length of this data in bytes so you can use Marshal.Copy
method to copy from pointer to byte array after copying or finish the audio handling you have to call Dispose of the AudioMediaBuffer
to free the memory where the 20ms audio were saved if you have any questions do not hesitate to ask me
Make sure you've granted your bot registry the right permissions to access media. If you do not, you will not be able to access any audio or video at all.
Hi @JasonTheDeveloper @1fabi0 we are able to capture the unmixed audio in the SerializableUnmixedAudioBuffers[i]. Now we want to pass that bytes to the azure speech Push audio stream . But that byte is not forming a speech . How we can make it as a speech to text in the real time .
We just want to do the speech to text from the bytes we are getting from the SerializableUnmixedAudioBuffers variable .
https://github.com/microsoftgraph/microsoft-graph-comms-samples/blob/master/Samples/V1.0Samples/AksSamples/teams-recording-bot/src/RecordingBot.Services/Media/SerializableAudioMediaBuffer.cs. it seems the bytes are not in the correct form to pass in the Azure PushAudio Stream Input
@yashugupta786 just push the raw audio bytes to the audio stream of the speach to text recognizer or use a model for speech to text in the bot
Hi @1fabi0 it is not working . Failing to do speech to text using audio push stream .i am directly passing the bytes received from unmixed audio buffer to the audio push stream . Do we need to resample the bytes or something like that to make it acceptable to audio push stream .Can you please share some snippet on it ,how this can be achived assuming the bytes received from unmixed audio buffer to audio push stream of azure cognitive services
@yashugupta786 you don't need to change the format it's fine to push raw pcm data into that stream we do that also, but you need a SpeechRecognizer
_audioConfig = AudioConfig.FromStreamInput(_audioStream);
_recognizer = new Microsoft.CognitiveServices.Speech.SpeechRecognizer(speechConfig, _audioConfig);
await _recognizer.StartContinuousRecognitionAsync();
and then you can do
_audioStream.Write(audioData);
(_audioStream
is _audioStream = AudioInputStream.CreatePushStream()
)
for more information look into the documentation of the SpeechRecognizer service
Hi @1fabi0 We are doing the same thing .But we are failing in order to pass the bytes .Please check below and correct it .If we are passing the bytes in correct manner to the PUSH Stream
var unmixedAudioBuffer = new SerializableUnmixedAudioBuffer(buffer.UnmixedAudioBuffers[i], _getParticipantFromMSI(speakerId));
SerializableUnmixedAudioBuffers[i] = unmixedAudioBuffer;
SpeechTranslation(unmixedAudioBuffer.Buffer).Wait();
//TranslationWithFileAsync().Wait();
}
}
}
}
public async Task SpeechTranslation(byte[] Buffer)
{
byte channels = 1;
byte bitsPerSample = 16;
int samplesPerSecond = 16000;
var audioFormat = AudioStreamFormat.GetWaveFormatPCM((uint)samplesPerSecond, bitsPerSample, channels);
string fromLanguage = "en-US";
var config = SpeechTranslationConfig.FromSubscription(key, region);
config.SpeechRecognitionLanguage = fromLanguage;
config.AddTargetLanguage("de");
config.AddTargetLanguage("fr");
config.AddTargetLanguage("en");
var stopTranslation = new TaskCompletionSource<int>();
using var audioInputStream = AudioInputStream.CreatePushStream(audioFormat);
using (var audioInput = AudioConfig.FromStreamInput(audioInputStream))
{
using (var recognizer = new TranslationRecognizer(config, audioInput))
{
// Subscribes to events.
recognizer.Recognizing += (s, e) =>
{
Console.WriteLine($"RECOGNIZING in '{fromLanguage}': Text={e.Result.Text}");
foreach (var element in e.Result.Translations)
{
Console.WriteLine($" TRANSLATING into '{element.Key}': {element.Value}");
}
};
recognizer.Recognized += (s, e) => {
if (e.Result.Reason == ResultReason.TranslatedSpeech)
{
Console.WriteLine($"RECOGNIZED in '{fromLanguage}': Text={e.Result.Text}");
foreach (var element in e.Result.Translations)
{
Console.WriteLine($" TRANSLATED into '{element.Key}': {element.Value}");
}
}
else if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
Console.WriteLine($" Speech not translated.");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
}
stopTranslation.TrySetResult(0);
};
recognizer.SpeechStartDetected += (s, e) => {
Console.WriteLine("\nSpeech start detected event.");
};
recognizer.SpeechEndDetected += (s, e) => {
Console.WriteLine("\nSpeech end detected event.");
};
recognizer.SessionStarted += (s, e) => {
Console.WriteLine("\nSession started event.");
};
recognizer.SessionStopped += (s, e) => {
Console.WriteLine("\nSession stopped event.");
Console.WriteLine($"\nStop translation.");
stopTranslation.TrySetResult(0);
};
Console.WriteLine("Start translation...");
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
Task.WaitAny(new[] { stopTranslation.Task });
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
}
}
// </TranslationWithFileAsync>
}
@yashugupta786 you are creating every 20ms a recognizer you have to create a recognizer once and then push the bytes always into the audio stream and not recreate a recognizer and you Actually don't push
@1fabi0 Can u please correct it in the above code as azure and this thing is very new and i am not getting how it internally works.
Ok but I feel very uncomfortable with doing your job and write code for you
public class CallHandler : ....
{
private readonly ICall _call;
//a class as value in this dictionary would make more sense
private readonly ConcurrentDictionary<uint,(AudioInputStream, SpeechTranslationConfig,AudioStreamFormat,TranslationRecognizer,...(what ever belongs to the recognizer stuff))> _recognizers = new ConcurrentDictionary<...>();
public CallHandler(...) : base(...)
{
_call = call; //call from creation
_call.GetLocalMediaSession().AudioSocket.AudioMediaReceived+= OnAudioMediaReceived;
_call.Participants.OnUpdated += OnParticipantsUpdated += OnParticipantsUpdated;
//do what ever is needed to do
}
private void OnParticipantsUpdated(IParticipantCollection sender, CollectionEventArgs<IParticipant> args)
{
foreach(var added in args.AddedResources)
{
Task.Run( async () =>
{
var audioSourceId = uint.Parse(added.Resource.MediaStreams.FirstOrDefault(stream => stream.MediaType == Modality.Audio).SourceId);
byte channels = 1;
byte bitsPerSample = 16;
int samplesPerSecond = 16000;
var audioFormat = AudioStreamFormat.GetWaveFormatPCM((uint)samplesPerSecond, bitsPerSample, channels);
string fromLanguage = "en-US";
var config = SpeechTranslationConfig.FromSubscription(key, region);
config.SpeechRecognitionLanguage = fromLanguage;
config.AddTargetLanguage("de");
config.AddTargetLanguage("fr");
config.AddTargetLanguage("en");
using var audioInputStream = AudioInputStream.CreatePushStream(audioFormat);
var audioInput = AudioConfig.FromStreamInput(audioInputStream);
var recognizer = new TranslationRecognizer(config, audioInput);
// Subscribes to events.
recognizer.Recognizing += (s, e) =>
{
Console.WriteLine($"RECOGNIZING in '{fromLanguage}': Text={e.Result.Text}");
foreach (var element in e.Result.Translations)
{
Console.WriteLine($" TRANSLATING into '{element.Key}': {element.Value}");
}
};
recognizer.Recognized += (s, e) => {
if (e.Result.Reason == ResultReason.TranslatedSpeech)
{
Console.WriteLine($"RECOGNIZED in '{fromLanguage}': Text={e.Result.Text}");
foreach (var element in e.Result.Translations)
{
Console.WriteLine($" TRANSLATED into '{element.Key}': {element.Value}");
}
}
else if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
Console.WriteLine($" Speech not translated.");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"CANCELED: Reason={e.Reason}");
if (e.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
}
stopTranslation.TrySetResult(0);
};
recognizer.SpeechStartDetected += (s, e) => {
Console.WriteLine("\nSpeech start detected event.");
};
recognizer.SpeechEndDetected += (s, e) => {
Console.WriteLine("\nSpeech end detected event.");
};
recognizer.SessionStarted += (s, e) => {
Console.WriteLine("\nSession started event.");
};
recognizer.SessionStopped += (s, e) => {
Console.WriteLine("\nSession stopped event.");
Console.WriteLine($"\nStop translation.");
stopTranslation.TrySetResult(0);
};
Console.WriteLine("Start translation...");
await recognizer.StartContinuousRecognitionAsync();
_recognizers.TryAdd(audioSourceId, (audioInputStream,...));
});
}
foreach(var removed in args.RemovedResources)
{
Task.Run(async () =>
{
var audioSourceId = uint.Parse(added.Resource.MediaStreams.FirstOrDefault(stream => stream.MediaType == Modality.Audio).SourceId);
if(_recognizers.TryRemove(audioSourceId, out var removedRecognizer)){
//item must have a number like Item1, Item2, ..., where the recognizer in the tuple
await removedRecognizer.Item.StopContinuousRecognitionAsync();
}
});
}
}
private void OnAudioMediaReceived(object sender, AudioMediaReceivedEventArgs args)
{
try
{
//first should check if not silence and unmixed not null and if mixed still contains some audio data that need to be processed so it can also be handled
var buffer = e.Buffer;
var unmixedAudioBuffers = new List<SerializableUnmixedAudioBuffer>();
//foreach/ parrallel.Foreach
//copy from pointer should happen synchron and then we can process async
unmixedAudioBuffers.Add(new SerializableUnmixedAudioBuffer(buffer.UnmixedAudioBuffers[i]));
//fire'n'forget to fast process as sky media sdk want's fast finish of this method
Task.Run(() =>
{
// I Assume your unmixedAudioBuffer contains UnmixedAudioBuffer.ActiveSpeakerId
//foreach/parallel.Foreach SerializableUnmixedAudioBuffer
//honestly idk what the push data method is called correctly i hope that is correct
_recognizer[unmixedAudioBuffer[i].ActiveSpeakerId].Item1.Write(unmixedAudioBuffer[i].Data)
});
}
catch
{
}
finally
{
e.Buffer.Dispose();
}
}
}
Sorry for the bad code but i just wrote in this github text box so don't expect too much and enhance and change it your self if you have any questions please do not hesitate to ask me
Best Regards Fabian
Hi @1fabi0
In recording and meeting bot i am get the display name as null ,i am getting the null adid also i am getting the speaker id . How to get the display name to the corresponding bytes i am receiving in the unmixed audio buffer variable
if (buffer.UnmixedAudioBuffers != null)
{
SerializableUnmixedAudioBuffers = new SerializableUnmixedAudioBuffer[buffer.UnmixedAudioBuffers.Length];
for (var i = 0; i < buffer.UnmixedAudioBuffers.Length; i++)
{
if (buffer.UnmixedAudioBuffers[i].Length > 0)
{
var speakerId = buffer.UnmixedAudioBuffers[i].ActiveSpeakerId;
var unmixedAudioBuffer = new SerializableUnmixedAudioBuffer(buffer.UnmixedAudioBuffers[i], _getParticipantFromMSI(speakerId));
SerializableUnmixedAudioBuffers[i] = unmixedAudioBuffer;
}
}
}
please help how to get the display name . As of now it is null
Probably currently it's null because you have their some guest speaker or anything else who's speaker info is saved in the additional data but the participants contain their source ID in the streams Ressource of them self so with the code above you probably already get the correct participant but username is in the additional data as he is a dial in or guest or sth. else user
@1fabi0 @JasonTheDeveloper please pardon me as I am not able to understand please correct me how to get the display name for the guest user or all the users in the meeting .It is difficult to distinguish the users with in the meeting. adid and display name is null in the stream.. how to correct this
if (buffer.UnmixedAudioBuffers != null)
{
SerializableUnmixedAudioBuffers = new SerializableUnmixedAudioBuffer[buffer.UnmixedAudioBuffers.Length];
for (var i = 0; i < buffer.UnmixedAudioBuffers.Length; i++)
{
if (buffer.UnmixedAudioBuffers[i].Length > 0)
{
var speakerId = buffer.UnmixedAudioBuffers[i].ActiveSpeakerId;
var unmixedAudioBuffer = new SerializableUnmixedAudioBuffer(buffer.UnmixedAudioBuffers[i], _getParticipantFromMSI(speakerId));
SerializableUnmixedAudioBuffers[i] = unmixedAudioBuffer;
}
}
}
so in the participant you normaly do IParticipant.Resource.Info.Identity.User
to get the Identity of the user if you now have a guest he has a key called 'guest' in the IParticipant.Resource.Info.Identity.AdditionalData
you can do this for example by doing by checking contains key 'guest' and then do IParticipant.Resource.Info.Identity.AdditionalData["guest"] as Identity
another approach can be using a overload that automatically determines this IParticipant.Resource.Info.Identity.GetPrimaryIdenetity[WithType]()
which will give you the primary identity if you choose with type it will also return you the type if it is guest, phone or etc. But please keep in mind if you can get other application instances including yours with this and for users that using dial in you don't have a display name so you will have to distinguish
Hi @1fabi0 When we are sending bytes receiving from unmixed audio buffer to the azure speech to text and performing continuous speech recognition we are getting "Speech cannot be recognized " as a result . do we need to do any changes or do we need to resample bytes received from the unmixed audio buffer ..
No sound is received as pcm data normally the audio is reencoded by the speech to text client but the speech to text service documentation should give you more information about how you set it up to upstream pcm-audio data
@1fabi0 how to convert the bytes receiving from unmixed audio buffer to PCM so that it can be pass to speech to text client . is there is setting in the unmixed audio buffer or how it can achieved please helps us we are really stuck .
do we need to change the channels or something .
Any documentation on this would be much appreciated
The bytes you are receiving are in pcm you don't need to convert or anything, if you need short array for pcm samples just Marshal copy to a short array with a length of 320
@1fabi0 I have check the Azure documentation it also expects the pcm format . If the bytes I am receiving from the unmixed audio buffer is of pcm format and speech to text service also expects the same format then why it is giving me the error . Speech could not be recognized . where is actually the problem and how it can be resolved
Hi @1fabi0 we are able to get the speech . I have one question how we can map display name, speaker id with the recognized speech coming from azure service .
Participants have a stream array in their Ressource the stream array defines streaming directions and modalities of the user and the source ID which is the ID you get in the audio packages
How to fetch this source id this.participants.SingleOrDefault(x => x.Resource.IsInLobby == false && x.Resource.MediaStreams.Any(y => y.SourceId == msi.ToString()));
iS this the same thing you are referring @1fabi0
I just wrote you where you find these id's in the Ressource of the Participants, don't expect that I code you this thing or do your work, most things are pretty much standardized and all graph resources are documented and the communication stuff is also documented, not in the docs but on this page https://microsoftgraph.github.io/microsoft-graph-comms-samples/docs/index.html
@1fabi0 The documentation is very limited and we dont have the Teams knowledge and its api as we work as a external from microsoft so these things are very new and documentation is also very limited Thats why i approached to you . as most of the things are not working
Actually combined with normal graph docu it's quite good documented if you then try a few things and check how things are done in the samples it doesn't need any microsoft internals as I also started on the bot without being in touch with Microsoft or having any help here. I agree that a few things hard to find out but as I also said try things out how it exactly works as it's always hard(and takes longer) to explain exactly how sth. in detail/in code needed to be done for to reach the goal
@1fabi0 Can you refer some examples on this or any links where i can create recognizer object for every participant and can do speech to text with name mapping .
I have already achieved and integrate the speech to text in below file https://github.com/microsoftgraph/microsoft-graph-comms-samples/blob/master/Samples/V1.0Samples/AksSamples/teams-recording-bot/src/RecordingBot.Services/Media/SerializableAudioMediaBuffer.cs. . but i am failed in name mapping with the sspeech to text
I would create a class which contains the upstreaming and all the speech to text things and put them in a Concurrent Dictionary with the source ID as key and this class as value and if a user joins you create a class for the participant and add this to the dictionary. If you receive something you push it to your class. It's then just about combining and extending existing code which will make it very easy, because I think there are samples for c# speech to text and the recording bot is receiving audio
@1fabi0 Thanks for input . just one question I have ,Source id is the speaker id for the participants . Also Can i create dictionary in BotStream.cs
Yes exactly
@1fabi0 Thanks for your support finally we are able to map the transcription with the names or speaker id . Thanks a lot for the support really means a lot .
Hi @1fabi0 hope you are doing good
Few things need to ask
RECOGNIZ-ING: Text={"Duration":4300000,"Id":"33661131e4b54d5cb5954ac005c712ee","Offset":79600000,"Text":"What is"} RECOGNIZ-ING: Text={"Duration":6600000,"Id":"b24420e71a8b4b8a8276414c0a78edd8","Offset":79600000,"Text":"What is the weather"} RECOGNIZ-ING: Text={"Duration":17400000,"Id":"199cc81c096e4e108451b494f0b99313","Offset":79600000,"Text":"What is the weather of Seattle"} RECOGNIZ-ING: Text={"Duration":17600000,"Id":"199cc81c096e4e108451b494f0b99313","Offset":79600000,"Text":"What is the weather of Seattle today ?"} RECOGNIZ-ED: Text={"DisplayText":"What is the weather of Seattle today ?","Duration":17600000,"Id":"7e02086a833542d79d435ac9abf4b7dc","Offset":79600000,"RecognitionStatus":"Success"}
is there a need for doing the byte channels = 1; byte bitsPerSample = 16; int samplesPerSecond = 16000; var audioFormat = AudioStreamFormat.GetWaveFormatPCM((uint)samplesPerSecond, bitsPerSample, channels);
2 How to send the translations from the backend back to teams meeting app and show to users . Is there is any demo or document which we can relate
Thanks I'm fine, I hope you're fine
Hi @1fabi0 Can you refer some example where i can create apps in meeting in MS teams and take input from there either from admin or users who have joined the call and process that input in backend ..
https://docs.microsoft.com/en-us/microsoftteams/platform/tabs/what-are-tabs this is the official documentation of teams tabs.
Hi @1fabi0 hope you are doing great . How to get the email adrress and user id of the participant in the meeting . As in added.Resource.Info.Identity.User only display name is coming . we also need the particpants email id and other details .
Hi, thanks I'm fine, You can get the user ID as you found out and use the user ID with the graph user endpoint to fetch more information.
Also meeting ID(when I get you right you want to have meeting ID in online meeting graph endpoint) is always a bit tricky because it's only possible to query in the online meeting graph endpoint with the join url.
HI @1fabi0 I am not able to find the user id i am only getting the display name in participants > resource > identity > info in callhandler.cs file . My question is how to get the email address or userid of the user . so that i can call graph api for having the complete info of the user joined in a meeting
The Id
in the Identity-Object is already the user ID in the ms tenant
I am trying to capture the audio with media SDK with in teams . Ngrok has issues 3.tcp.ngrok.io for US region . when i trying to execute the code with all the configuring bot is not coming up in the meeting . However if i am getting 1.tcp.ngrok.io reserve tcp address bot is coming in the meeting