Open xandramax opened 4 years ago
Opera, Andean Music, Sufi, Baroque, Kirtan, Canterbury, Operatic Pop, Mystic Folk, Anime, Poetry, Ragtime, Appalachian Folk, Religious, Sea Shanties, Christian Hymns, Spirituals, Barbershop, Choral, Gregorian Chant, and Boogie Woogie also fail to load.
Loading artist IDs from ~/jukebox/jukebox/data/ids/v3_artist_ids.txt
Loading artist IDs from ~/jukebox/jukebox/data/ids/v3_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:786432
Downloading from gce
Restored from ~/.cache/jukebox-assets/models/1b_lyrics/prior_level_2.pth.tar
0: Loading prior in eval mode
Traceback (most recent call last):
File "jukebox/sample.py", line 366, in <module>
fire.Fire(run)
File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "jukebox/sample.py", line 363, in run
save_samples(model, device, hps, sample_hps)
File "jukebox/sample.py", line 327, in save_samples
labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
File "jukebox/sample.py", line 327, in <listcomp>
labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
File "~/jukebox/jukebox/data/labels.py", line 60, in get_batch_labels
label = self.get_label(**meta)
File "~/jukebox/jukebox/data/labels.py", line 33, in get_label
genre_ids = self.ag_processor.get_genre_ids(genre)
File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in get_genre_ids
return [self.genre_ids[word] for word in genres]
File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in <listcomp>
return [self.genre_ids[word] for word in genres]
KeyError: 'sea'
Thanks. Looks like this is historical that we had trained 1B and 5B separately with different genres, but in the merge, the 1B is using the 5B's genres for the upsamplers. I'll adjust so the upsamplers won't complain if they see surprising genre words.
@mcleavey, is there a related issue here with the colab notebook? When I use the colab notebook to load 5b_lyrics and then specify a genre that exists in VERSION 3 (v3_genre_ids.txt), but not in the version 2 (v2_genre_ids.txt), the cell where you specify your metas throws an error.
For example, if you try:
metas = [dict(artist = "barry white",
genre = "coldwave",
total_length = hps.sample_length,
offset = 0,
lyrics = """Some lyrics.
""",
),
] * hps.n_samples
labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]
This will throw a Key Error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-5-e02795cb531e> in <module>()
16 ),
17 ] * hps.n_samples
---> 18 labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]
3 frames
/usr/local/lib/python3.6/dist-packages/jukebox/data/artist_genre_processor.py in <listcomp>(.0)
51 # In v2, we convert genre into a bag of words
52 genres = norm(genre).split("_")
---> 53 return [self.genre_ids[word] for word in genres]
54
55 # get_artist/genre throw error if we ask for non-present values
KeyError: 'coldwave'
@kcrosley-leisurelabs Yes, the 5B model was trained with the v2 genres (historically, the 5B-without-lyrics came first so was v2, and then we branched out to experiment with a 1B model with lyrics, which became v3). I'm wrapped up with other work this afternoon, but will update names/comments to make this more clear & intuitive.
@mcleavey thanks so much for the clarification.
So, I'm still kind of confused about this. The 1B model is smaller but has larger numbers of genres and artists? (Can that really be true?)
I notice that the latest commit now complains if one specifies a V3 artist when using 5b_lyrics whereas it didn't before - it notes that the artist will be mapped to "unknown" (again, this occurs in the colab notebook -- BTW, the notebook shared by @SMarioMan in https://github.com/openai/jukebox/issues/40 is vastly superior to the one in the current distro as it uses Google drive to store generated samples rather than volatile session storage and also demonstrates how to prime the model).
Final question: Before the latest updates, I'd been able to specify artists from V3 list with the 5b_lyrics model and it didn't throw any errors or warnings. Under the hood, was this simply silently mapping them to "unknown" in previous builds?
(Sorry for what might be derpy questions. I'm pretty novice with the AI rocket surgery stuff. ;) )
Also, this genre is duplicated in v3_genre_ids.txt at id 107 and id 295.