nttcslab / m2d

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
https://ieeexplore.ieee.org/document/10502167
Other
54 stars 1 forks source link

environment details? #4

Closed thuster closed 11 months ago

thuster commented 11 months ago

I'm trying to replicate the experiments in this paper. Looks like a cool framework, but installation and testing with m2d+evar is pretty challenging. After pip installing requirements, there appear to be version problems (e.g., np.float, url.requests). After debugging a few obvious things, I'm now getting a segfault.

Can you provide a conda environment or pip reqs with versions? It would also be nice if the installation was a bit more streamlined, although the instructions are pretty accurate if carefully followed.

Thanks!

daisukelab commented 11 months ago

Hi @thuster,

Thank you for your attention. I should first share that I have updated M2D to add M2D for Speech and also updated EVAR. My fixes should include the np.float issue, but I will confirm it later.

Here are my environmental details.

And the followings are my test scripts. I ran them successfully yesterday.

Create a new Python environment

conda create -n ar python==3.8
conda activate ar
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt

You can remove the environment with conda remove -n ar --all anytime later.

Setting M2D up

git clone https://github.com/nttcslab/m2d.git
cd m2d
curl -o util/lars.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/util/lars.py
curl -o util/lr_decay.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/util/lr_decay.py
curl -o util/lr_sched.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/util/lr_sched.py
curl -o util/misc.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/util/misc.py
curl -o m2d/pos_embed.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/util/pos_embed.py
curl -o train_audio.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/main_pretrain.py
curl -o train_speech.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/main_pretrain.py
curl -o mae_train_audio.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/main_pretrain.py
curl -o m2d/engine_pretrain_m2d.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/engine_pretrain.py
curl -o m2d/models_mae.py https://raw.githubusercontent.com/facebookresearch/mae/efb2a8062c206524e35e47d04501ed4f544c0ae8/models_mae.py
curl -o m2d/timm_layers_pos_embed.py https://raw.githubusercontent.com/huggingface/pytorch-image-models/e9373b1b925b2546706d78d25294de596bad4bfe/timm/layers/pos_embed.py
patch -p1 < patch_m2d.diff

Setting EVAR up

(in the m2d folder)
git clone https://github.com/nttcslab/eval-audio-repr.git evar
cd evar
curl https://raw.githubusercontent.com/daisukelab/general-learning/master/MLP/torch_mlp_clf2.py -o evar/utils/torch_mlp_clf2.py
curl https://raw.githubusercontent.com/daisukelab/sound-clf-pytorch/master/for_evar/sampler.py -o evar/sampler.py
curl https://raw.githubusercontent.com/daisukelab/sound-clf-pytorch/master/for_evar/cnn14_decoupled.py -o evar/cnn14_decoupled.py
ln -s /my_lab/common/evar/work .   (Please create your preprocessed data under work instead)
cp /my_lab/common/evar/evar/metadata/* evar/metadata    (Please create your own instead)
cd ..

I hope it works for you too. However, if you still face any issues, please feel free to ask me again without any hesitation.

thuster commented 11 months ago

Thanks for the response! Those updates helped. I think my segfault was a hardware issue on my side. It looks like both the linear and finetuning are working now. The only remaining environment error I got was when I tried to create the metadata. The issue was with this line:

https://github.com/nttcslab/eval-audio-repr/blob/main/evar/utils/make_metadata.py#L20

I replaced this line with import urllib.request and everything worked.

Thanks for the help - looking forward to trying it out.

daisukelab commented 11 months ago

Hi @thuster,

Thanks for the quick confirmation and for pointing out the urllib issue. I have fixed the issue, and you will be able to create GTZAN and VC1 metadata files. I'm sorry for not being aware of the problem and for taking your time.

I hope our resources work for your progress. Thanks again!