thevasudevgupta / gsoc-wav2vec2

GSoC'2021 | TensorFlow implementation of Wav2Vec2
https://thevasudevgupta.github.io/gsoc-wav2vec2/assets/final_report
Apache License 2.0
88 stars 29 forks source link

Questions about processor #19

Closed ahmedlone127 closed 3 years ago

ahmedlone127 commented 3 years ago

what does this code do :

def _normalize(self, x):
        """You must call this before padding."""
        # -> (1, seqlen)
        mean = tf.reduce_mean(x, axis=-1, keepdims=True)
        var = tf.math.reduce_variance(x, axis=-1, keepdims=True)
        return tf.squeeze((x - mean) / tf.sqrt(var + 1e-5))

my other question is on what basis are numbers assigned to the vocab list by that i mean this : image

I understand the code in the picture it basically gets all the characters from the text but my question is when it turns the characters into a dictionary with the values as their index does it matter what character is at what index and if yes then how does the right character get at the right index. I was trying to test my version of your tokenizer and I had trouble producing the right outputs with your vocab.json so I went and took the one here which worked fine.Also i was using a fine-tuned model for making predictions which was associated with this tokenizer via hugging face

thevasudevgupta commented 3 years ago

Hi @ahmedlone127,

Thanks for your interest in this project!!

def _normalize(self, x):
        """You must call this before padding."""
        # -> (1, seqlen)
        mean = tf.reduce_mean(x, axis=-1, keepdims=True)
        var = tf.math.reduce_variance(x, axis=-1, keepdims=True)
        return tf.squeeze((x - mean) / tf.sqrt(var + 1e-5))

Wav2Vec2 was trained after normalising speech along time axis. So this code is allowing that functionality. In my repository, Wav2Vec2Processor has 2 different functionality- one handles preprocessing of speech (when is_tokenizer=False) & other handles post processing of model outputs (i.e decoding logits into string) (when is_tokenizer=True). So, above code is relevant to instance created by setting is_tokenizer=False. You can refer this notebook for better understanding.

my other question is on what basis are numbers assigned to the vocab list by that i mean this :

This vocabulary file is getting used (https://github.com/vasudevgupta7/gsoc-wav2vec2/blob/main/data/vocab.json) for de-tokenizing. This file has been taken from pre-trained Wav2Vec2 model directly.

Hoping this would help!!

ahmedlone127 commented 3 years ago

hey thanks for the answer I just ran the notebook you attached and looks like some of the stuff needs to be updated

thevasudevgupta commented 3 years ago

I just fixed it now. Can you try running that notebook again?

ahmedlone127 commented 3 years ago

yeah looks good ! thanks , also why do you specify axis =-1 and keepdims = True

I was trying to duplicate this to scala and this is what i got uptill now :

  def mean(list:List[Double]):Double = if(list.isEmpty) 0 else list.sum/list.size
  def variance(xs: Seq[Double]): Option[Double] = {
    mean(xs).flatMap(m => mean(xs.map(x => Math.pow(x-m, 2))))
  }

it's for the first two lines , do they look good to you I am anxious casue i don't understand what keepdims= True and axis =-1 mean casue i am probably not adding their functionality inside this function

thevasudevgupta commented 3 years ago

I am axis=-1 to make sure normalization is happening along time dimension. keepdims=True will help us keep the nD array as output if input is nD array.

I would encourage you to print out outputs of these statements to understand them better. Since, I am not familiar with scala, I am not sure if your code is correct or wrong.

ahmedlone127 commented 3 years ago

okay thanks !

ahmedlone127 commented 3 years ago

okay so i am pretty much done with verifying the outputs even though i couldn't implent axis=-1 it looked identical with alot more precision , I want to ask why do we call tf.transpose here even though the output after and before calling it is pretty much the same

image

thevasudevgupta commented 3 years ago

Hey, sorry for late reply. You can avoid tf.transpose if everything looking alright without it.

thevasudevgupta commented 3 years ago

Closing this issue as everything is resolved. Please create a new issue in case you wanna discuss something.