Add audio resampling to `PunctuationRestorer`

Hi @patrickvonplaten 👋,

As discussed in https://github.com/huggingface/speechbox/issues/4, this PR

[x] Adds resampling (scipy's version) into the __call__ method of the PunctuationRestorer
[x] Adds Scipy as a soft dependency for PunctuationRestorer

Below is a test snippet

import string
import re
from datasets import load_dataset
from speechbox import PunctuationRestorer

streamed_dataset = load_dataset("mozilla-foundation/common_voice_11_0", "en", split="validation", streaming=True)

# get first sample
sample = next(iter(streamed_dataset))

# print out normalized transcript
print(sample["sentence"])
# => "It is from Westport, above the villages of Murrisk and Lecanvey."
sentence = re.sub(rf"[{re.escape(string.punctuation)}]", "", sample["sentence"]).lower()
print(sentence)
# => "it is from westport above the villages of murrisk and lecanvey"

# load the restoring class
restorer = PunctuationRestorer.from_pretrained("openai/whisper-tiny.en")
restorer.to("cuda")

restored_text, log_probs = restorer(sample["audio"]["array"], sentence, sampling_rate=sample["audio"]["sampling_rate"], num_beams=1)

print("Restored text:\n", restored_text)
# Restored text:
# It is from Westport above the villages of MURRISK and LECANVEY.

huggingface / speechbox

Add audio resampling to `PunctuationRestorer` #5