open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
7.55k stars 561 forks source link

[Feature]: Could you please release a version on Google Colab? #103

Closed songweionline closed 9 months ago

songweionline commented 10 months ago

Could you please release a version on Google Colab? This repository is too challenging for beginners, from environment setup to usage instructions. I have tried many times, but without success.

yuantuo666 commented 10 months ago

We are currently building a docker image to simplify the environment setup for beginners. Though it is still WIP, I would like to mention here that you are welcome to have a test: https://github.com/open-mmlab/Amphion/issues/99#issuecomment-1888860380

RMSnow commented 10 months ago

Hi @songweionline, would you describe your trouble more specific, such as attaching some running commands and errors report?

Thanks for your advice about Colab. Actually we are planning to integrate some typical models into Colab/Jupyter Notebook, just for teaching and education. However, the specific date has not been decided, since it will take much labor and testing cost. Hope we can release some of them in this summer :)

songweionline commented 10 months ago

@yuantuo666 I've used the Docker image with this URL:“https://huggingface.co/spaces/amphion/singing_voice_conversion/tree/main”, but it's not working. I'm getting an error: "exposing port TCP 0.0.0.0:7860 -> 0.0.0.0:0: listen tcp 0.0.0.0:7860: bind: An attempt was made to access a socket in a way forbidden by its access permissions." I'm quite sure that port 7860 is not in use, and I have administrator privileges. Other images, such as MySQL, Ubuntu, Redis, and even the NVIDIA driver, are working fine. My operating system is Windows 10. I'm not sure where the issue lies, and even if I manage to resolve this problem, this version of the demo doesn't seem to provide training methods.

songweionline commented 10 months ago

@RMSnow I only have a Windows environment, and I've tried deploying it on VMware following the repository's README. However, due to potential issues with the Great Firewall (GFW), the download speed for the models is too slow, causing the process to fail. Subsequently, I attempted to deploy it on Google Colab following the instructions from this URL: "https://huggingface.co/amphion/singing_voice_conversion". After several attempts, I recall encountering an error around 1/2 of the way through the model loading process.

yuantuo666 commented 10 months ago

@yuantuo666 I've used the Docker image with this URL:“https://huggingface.co/spaces/amphion/singing_voice_conversion/tree/main”, but it's not working. I'm getting an error: "exposing port TCP 0.0.0.0:7860 -> 0.0.0.0:0: listen tcp 0.0.0.0:7860: bind: An attempt was made to access a socket in a way forbidden by its access permissions." I'm quite sure that port 7860 is not in use, and I have administrator privileges. Other images, such as MySQL, Ubuntu, Redis, and even the NVIDIA driver, are working fine. My operating system is Windows 10. I'm not sure where the issue lies, and even if I manage to resolve this problem, this version of the demo doesn't seem to provide training methods.

It makes me confused since the Docker image we built did not bind any port or require bind port 7860.

The steps to use our Docker image are as follows:

  1. Install Docker, NVIDIA Driver, NVIDIA Container Toolkit, and CUDA.
  2. Run the following commands:
    
    git clone https://github.com/open-mmlab/Amphion.git
    cd Amphion

docker run --runtime=nvidia --gpus all -it -v .:/app yuantuo666/amphion # This is only test docker image; remember to change to newest one when we release it


3. Download the needed dataset and mount it to the docker container when running it: [Guide](https://github.com/yuantuo666/Amphion/blob/add-docker/egs/datasets/docker.md)
4. Follow the usage guide to config dataset path and run code
- [Text to Speech (TTS)](../tree/main/egs/tts/README.md)
- [Singing Voice Conversion (SVC)](../tree/main/egs/svc/README.md)
- [Text to Audio (TTA)](../tree/main/egs/tta/README.md)
- [Vocoder](../tree/main/egs/vocoder/README.md)
- [Evaluation](../tree/main/egs/metrics/README.md)

If you encounter any problem using the Docker image, please feel free to post the command you ran and a screenshot to tell us where you arrived. It would be better if you could reply to the docker image problem in this issue: https://github.com/open-mmlab/Amphion/issues/99 to ensure that each issue is not off-topic.
HarryHe11 commented 9 months ago

Hi @songweionline, thank you for your advice! We're in the process of preparing our Colab release, although a specific launch date is yet to be set. We're also open to pull requests from the community to help advance this feature for educational use.

Should you have any more questions, don't hesitate to re-open this issue. We're happy to provide further assistance!