Closed shansongliu closed 2 years ago
pre_init
in model.py are the init tokens for genre(first column), key(unused, second column) and instrument(third column). In your gen_midi_conditional.py
you define the embedding size of them as init_n_token = [1, 1, 1]
in line 48, so pre_init
is out of range.
You can fix it by :
init_n_token = [7, 1, 6]
in gen_midi_conditional.py
(if init_n_token
of your trained model is this)pre_init
in model.py
as empty array np.array([])
(if init_n_token
of your trained model is [1,1,1]
)Do you mean I should set the pre_init variable ( pre_init = np.array([[5, 0, 0], [0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 0, 3], [0, 0, 4], [0, 0, 5]])
) as pre_init = np.array([])
? I see in the train.py
, the value of init_n_token
is [1,1,1]
Yes, set pre_init as np.array([]). You can also try pre_init = np.array([[0, 0, 0]])
if np.array([]) doesn't work well
init_n_token
is not the token itself, but the number of embedding classes for genre, key and instrument.
Thanks for your quick reply. After I set pre_init
as np.array([[0, 0, 0]])
, the inference program can run without no more error message output (Set pre_init
as np.array([])
still triggers error). But what makes me feel strange is that the inference program does not seem to stop. It has run about 8 hours after launch for the 2min input video. I wonder is this normal? By the way, I haven't seen a midi output yet. Will the midi file be generated in the src/ folder? Thanks again for your patience.
That seems weird. Normally it runs for several minutes for a short video, and stop generating automatically with Beat Timing Encoding. Or it will break from the loop if music length exceed video length (see this).
I am not quite sure about your model setting, but I guess the video2npz pipeline has some problem. You can check the npz file(or vlog
in model.py) to see whether its length matches the video length.
For beat timing encoding you can also check the pbeat attribute (see the output when running inference, pbeat is the second column from the right), it should be monotonically increasing from 0 to 99.
As stated in README / Directory Structure, the generated midi files will be stored in the inference/ folder.
I followed the README instruction and it runs normally. Here are some of the generated and intermediate files: https://drive.google.com/drive/folders/1UtZXXLiY9PNFo-p3lIslQxlEKCQzIcnU?usp=sharing.
I followed the README instruction and it runs normally. Here are some of the generated and intermediate files: https://drive.google.com/drive/folders/1UtZXXLiY9PNFo-p3lIslQxlEKCQzIcnU?usp=sharing.
Hi, Shangzhe, it seems that the link needs access permission. I have already sent an access permission application. BTW, I indeed followed the detailed instruction provided by the README.md. But as I stated, the inference program could not stop (seems like it ran into an infinite loop) after I corrected the pre_init
variable advised by Zhaokai. Did you use the video data, trained model and inference code in this link https://drive.google.com/drive/folders/1Ch3jjxZrztKAtEvuEhGjxPk2-G0NSYe0?usp=sharing and successfully generate midi files?
I used your video, our model, and inference code in this repo without any modification. Perhaps your inference code or model has problems.
That seems weird. Normally it runs for several minutes for a short video, and stop generating automatically with Beat Timing Encoding. Or it will break from the loop if music length exceed video length (see this).
I am not quite sure about your model setting, but I guess the video2npz pipeline has some problem. You can check the npz file(or
vlog
in model.py) to see whether its length matches the video length.For beat timing encoding you can also check the pbeat attribute (see the output when running inference, pbeat is the second column from the right), it should be monotonically increasing from 0 to 99.
Hi, Zhaokai, could you be more specific about what might be wrong when I use the video2npz
pipeline? I followed the inference instruction in README.md. I saw there are three sub-steps in the video2npz.sh
script. For the first sub-step optical_flow.py
, the optical flow npz file was generated. Then for the second sub-step video2metadata.py
, a json file was generated. The last sub-step metadata2numpy_mix.py
generated a npz data file according to the last-sub-step-generated json file.
Then I used this npz data file together with my self-trained model and also the gen_midi_conditional.py
in which the decoder_n_class
and init_n_token
variables were changed in line with the training data (output by the train.py
file). After all these done, the inference program gen_midi_conditional.py
can actually run, but the only problem is that it seemed that it ran into an infinite loop.
For your mentioned points:
1) I am not quite sure about your model setting, but I guess the video2npz pipeline has some problem. You can check the npz file(or
vlog
in model.py) to see whether its length matches the video length.
I am not quite sure about the video length you mentioned. Do you mean the number of the video frames? Or the dimension of the vlog_npz
variable in gen_midi_conditional.py
?
2) For beat timing encoding you can also check the pbeat attribute (see the output when running inference, pbeat is the second column from the right), it should be monotonically increasing from 0 to 99.
Could you clarify which line (or which variable) in the source code you are referring to?
Again, many thanks for your patience and kindness. I really appreciate it.
I used your video, our model, and inference code in this repo without any modification. Perhaps your inference code or model has problems.
Thanks for your clarification.
That seems weird. Normally it runs for several minutes for a short video, and stop generating automatically with Beat Timing Encoding. Or it will break from the loop if music length exceed video length (see this). I am not quite sure about your model setting, but I guess the video2npz pipeline has some problem. You can check the npz file(or
vlog
in model.py) to see whether its length matches the video length. For beat timing encoding you can also check the pbeat attribute (see the output when running inference, pbeat is the second column from the right), it should be monotonically increasing from 0 to 99.Hi, Zhaokai, could you be more specific about what might be wrong when I use the
video2npz
pipeline? I followed the inference instruction in README.md. I saw there are three sub-steps in thevideo2npz.sh
script. For the first sub-stepoptical_flow.py
, the optical flow npz file was generated. Then for the second sub-stepvideo2metadata.py
, a json file was generated. The last sub-stepmetadata2numpy_mix.py
generated a npz data file according to the last-sub-step-generated json file.Then I used this npz data file together with my self-trained model and also the
gen_midi_conditional.py
in which thedecoder_n_class
andinit_n_token
variables were changed in line with the training data (output by thetrain.py
file). After all these done, the inference programgen_midi_conditional.py
can actually run, but the only problem is that it seemed that it ran into an infinite loop.For your mentioned points:
- I am not quite sure about your model setting, but I guess the video2npz pipeline has some problem. You can check the npz file(or
vlog
in model.py) to see whether its length matches the video length.I am not quite sure about the video length you mentioned. Do you mean the number of the video frames? Or the dimension of the
vlog_npz
variable ingen_midi_conditional.py
?
- For beat timing encoding you can also check the pbeat attribute (see the output when running inference, pbeat is the second column from the right), it should be monotonically increasing from 0 to 99.
Could you clarify which line (or which variable) in the source code you are referring to?
Again, many thanks for your patience and kindness. I really appreciate it.
you can check the value of n_beat and len(vlog). And also trace the value of cur_vlog to see why this break condition isn't executed
see the output when running inference, it should be like this
[ 9 1 6 0 0 3 4 35 216]
[ 3 1 10 0 0 5 1 36 226]
[ 0 2 0 74 16 5 0 36 226]
the second row from the right (35,36,36) indicates pbeat
That seems weird. Normally it runs for several minutes for a short video, and stop generating automatically with Beat Timing Encoding. Or it will break from the loop if music length exceed video length (see this). I am not quite sure about your model setting, but I guess the video2npz pipeline has some problem. You can check the npz file(or
vlog
in model.py) to see whether its length matches the video length. For beat timing encoding you can also check the pbeat attribute (see the output when running inference, pbeat is the second column from the right), it should be monotonically increasing from 0 to 99.Hi, Zhaokai, could you be more specific about what might be wrong when I use the
video2npz
pipeline? I followed the inference instruction in README.md. I saw there are three sub-steps in thevideo2npz.sh
script. For the first sub-stepoptical_flow.py
, the optical flow npz file was generated. Then for the second sub-stepvideo2metadata.py
, a json file was generated. The last sub-stepmetadata2numpy_mix.py
generated a npz data file according to the last-sub-step-generated json file. Then I used this npz data file together with my self-trained model and also thegen_midi_conditional.py
in which thedecoder_n_class
andinit_n_token
variables were changed in line with the training data (output by thetrain.py
file). After all these done, the inference programgen_midi_conditional.py
can actually run, but the only problem is that it seemed that it ran into an infinite loop. For your mentioned points:
- I am not quite sure about your model setting, but I guess the video2npz pipeline has some problem. You can check the npz file(or
vlog
in model.py) to see whether its length matches the video length.I am not quite sure about the video length you mentioned. Do you mean the number of the video frames? Or the dimension of the
vlog_npz
variable ingen_midi_conditional.py
?
- For beat timing encoding you can also check the pbeat attribute (see the output when running inference, pbeat is the second column from the right), it should be monotonically increasing from 0 to 99.
Could you clarify which line (or which variable) in the source code you are referring to? Again, many thanks for your patience and kindness. I really appreciate it.
- you can check the value of n_beat and len(vlog). And also trace the value of cur_vlog to see why this break condition isn't executed
- see the output when running inference, it should be like this
[ 9 1 6 0 0 3 4 35 216] [ 3 1 10 0 0 5 1 36 226] [ 0 2 0 74 16 5 0 36 226]
the second row from the right (35,36,36) indicates pbeat
Thanks for your detailed explanation, I will continue to check.
you can check the value of n_beat and len(vlog). And also trace the value of cur_vlog to see why this break condition isn't executed
I checked the value of n_beat
and len(vlog)
. They are not equal, n_beat=940 > len(vlog)=166
. And the value of cur_vlog
gets stuck at 14
and never proceeds. Does this mean the input npz file of the inference code gen_midi_conditional.py
is corrputed?
you can check the value of n_beat and len(vlog). And also trace the value of cur_vlog to see why this break condition isn't executed
I checked the value of
n_beat
andlen(vlog)
. They are not equal,n_beat=940 > len(vlog)=166
. And the value ofcur_vlog
gets stuck at14
and never proceeds. Does this mean the input npz file of the inference codegen_midi_conditional.py
is corrputed?
n_beat > len(vlog)
is normal, the former represents total number of beats, the latter represents Bar and Beat tokens. Can you provide the standard output of inference?
you can check the value of n_beat and len(vlog). And also trace the value of cur_vlog to see why this break condition isn't executed
I checked the value of
n_beat
andlen(vlog)
. They are not equal,n_beat=940 > len(vlog)=166
. And the value ofcur_vlog
gets stuck at14
and never proceeds. Does this mean the input npz file of the inference codegen_midi_conditional.py
is corrputed?
n_beat > len(vlog)
is normal, the former represents total number of beats, the latter represents Bar and Beat tokens. Can you provide the standard output of inference?
I put the newly generated standard output (stdout_new.txt
) in this link https://drive.google.com/drive/folders/1Ch3jjxZrztKAtEvuEhGjxPk2-G0NSYe0
For beat timing encoding you can also check the pbeat attribute (see the output when running inference, pbeat is the second column from the right), it should be monotonically increasing from 0 to 99.
Hi, Zhaokai, I observe that my pbeat attribute will be stuck to a number (say 5 or 14) and does not increase any longer when performing inference. I think this is the reason why the loop cannot stop. Do you have any idea why this can happen?
It seems that this is due to the inconsistency of init tokens between train and generate, and will appear when using another training set. This should be fixed by 8f7922930aa219aa605246ed67a6f98c5c8df0e1
It seems that this is due to the inconsistency of init tokens between train and generate, and will appear when using another training set. This should be fixed by 8f79229
Thanks, will try it.
It seems that this is due to the inconsistency of init tokens between train and generate, and will appear when using another training set. This should be fixed by 8f79229
Thanks, will try it.
I tried the modified version, now it gives the following error. It seems that it still has the dimension problem.
Traceback (most recent call last):
File "train.py", line 226, in
It seems that this is due to the inconsistency of init tokens between train and generate, and will appear when using another training set. This should be fixed by 8f79229
Thanks, will try it.
I tried the modified version, now it gives the following error. It seems that it still has the dimension problem.
Traceback (most recent call last): File "train.py", line 226, in train_dp() File "train.py", line 169, in train_dp losses = net(is_train=True, x=batch_x, target=batch_y, loss_mask=batch_mask, init_token=batch_init) File "/data/miniconda3/envs/mm21_py3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "/data/miniconda3/envs/mm21_py3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward return self.module(*inputs[0], *kwargs[0]) File "/data/miniconda3/envs/mm21_py3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate_new3/src/model.py", line 482, in forward return self.train_forward(**kwargs) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate_new3/src/model.py", line 450, in train_forward h, y_type = self.forward_hidden(x, memory=None, is_training=True, init_token=init_token) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate_new3/src/model.py", line 213, in forward_hidden encoder_pos_emb = torch.cat([init_emb_linear, encoder_pos_emb], dim=1) RuntimeError: Tensors must have same number of dimensions: got 2 and 3
I just downloaded the newest version of this repo and directly used the train.py
there without further modification.
A typo, just fix it by d4a6c33dbd6e1a6f001ce1ba405d09050cb0df2f, you can try the latest version
A typo, just fix it by d4a6c33, you can try the latest version
It can run now. Thanks. Will check the inference later.
Hi, I encountered some bugs while using the "gen_midi_conditional.py" code to generate midi files for a given video. I installed the Python 2 environment given the requirement file "py2_requirements.txt" and then used the "video2npz.sh" to produce a "xxx.npz" file for the given video. But I encountered some problems while using the "gen_midi_conditional.py" code, the program output and error report are pasted below:
Command I used: python3 gen_midi_conditional.py -f ../inference/LGpwmBqJF1Q_HarryPotter2ChamberOfSecrets.npz -c ../exp/train_exp/loss_70_params.pt
Code standard print: inference D_MODEL 512 N_LAYER 12 N_HEAD 8 DECODER ATTN causal-linear [18, 3, 18, 129, 18, 6, 27, 102, 5025] [*] load model from: ../exp/train_exp/loss_70_params.pt new song [vlog_npz matrix print here] ------ initiate ------ tensor([[[17, 1, 10, 0, 0, 0, 0, 1, 0]]])
Error print: Traceback (most recent call last): File "gen_midi_conditional.py", line 104, in
generate()
File "gen_midi_conditional.py", line 85, in generate
res, err_note_number_list, err_beat_number_list = net(is_train=False, vlog=vlog_npz, C=0.7)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, kwargs)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs, *kwargs)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, kwargs)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 483, in forward
return self.inference_from_scratch(*kwargs)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 341, in inference_from_scratch
h, y_type = self.forwardhidden(input, is_training=False, init_token=pre_init)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 216, in forward_hidden
init_emb_linear = self.forward_init_token(init_token)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 162, in forward_init_token
emb_genre = self.init_emb_genre(x[..., 0])
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, kwargs)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/utils.py", line 80, in forward
return self.lut(x) math.sqrt(self.d_model)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, kwargs)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/functional.py", line 2183, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
The inference code, trained model and data (including original video and processed .npz file) are attached in Google drive. Here is the link: https://drive.google.com/drive/folders/1Ch3jjxZrztKAtEvuEhGjxPk2-G0NSYe0?usp=sharing
Could you help me check this? Really appreciate it.
Best regards,