file loss and usage infomation

Thank you for your interest in the project and for bringing these issues to our attention. I apologize for any inconvenience caused. Let me address each of your points:

Missing requirements.txt: You're right, and I apologize for this oversight. I'll create and add a requirements.txt file to the repository. For now, the main dependencies are:
```
Cython==3.0.10
numpy==1.23.5
torch==2.1.0
```
Please install these using pip install -r requirements.txt once the file is added.
Missing setup.py: not missing, in monotonic_align/setup.py

Obtaining text_mask and mel_embeddings:

text_mask is a boolean tensor indicating which elements in the text sequence are valid (not padding). You can create it based on your input text length.
mel_embeddings are typically extracted from your mel spectrogram using a pre-processing step or a neural network. The exact method depends on your TTS pipeline.

Here's a simple example:

import torch

# Assuming batch_size = 1, seq_len = 10, embedding_dim = 80
text_embeddings = torch.randn(1, 10, 256)  # Replace with your actual text embeddings
mel_embeddings = torch.randn(1, 100, 80)   # Replace with your actual mel embeddings

text_mask = torch.ones(1, 10).bool()       # Adjust based on your actual text length
mel_mask = torch.ones(1, 100).bool()       # Adjust based on your actual mel length

alignment = aligner(text_embeddings, mel_embeddings, text_mask, mel_mask)

Mismatched wav and text: Using mismatched wav and text for alignment is not recommended as it will produce incorrect alignments. The aligner assumes that the input text and audio correspond to each other. If they don't match:
- The alignment process might still complete, but the results will be meaningless.
- You might encounter errors if the lengths are significantly different.
- The quality of any TTS system using these alignments will be severely compromised.
Always ensure that your wav files and text inputs correspond correctly to each other.

I hope this helps clarify things. I'll update the repository with the requirements.txt file and improve the documentation to make the usage clearer. If you have any more questions, please don't hesitate to ask!

xiaozhah / Aligner

file loss and usage infomation #1