v3 Genre ID "jazz fusion" is non-functional

xandramax commented 4 years ago

Also, this genre is duplicated in v3_genre_ids.txt at id 107 and id 295.

Traceback (most recent call last):
  File "jukebox/sample.py", line 307, in <module>
    fire.Fire(run)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 304, in run
    save_samples(model, device, hps, sample_hps)
  File "jukebox/sample.py", line 268, in save_samples
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "jukebox/sample.py", line 268, in <listcomp>
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "~/jukebox/jukebox/data/labels.py", line 60, in get_batch_labels
    label = self.get_label(**meta)
  File "~/jukebox/jukebox/data/labels.py", line 33, in get_label
    genre_ids = self.ag_processor.get_genre_ids(genre)
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in get_genre_ids
    return [self.genre_ids[word] for word in genres]
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in <listcomp>
    return [self.genre_ids[word] for word in genres]
KeyError: 'fusion'

xandramax commented 4 years ago

Opera, Andean Music, Sufi, Baroque, Kirtan, Canterbury, Operatic Pop, Mystic Folk, Anime, Poetry, Ragtime, Appalachian Folk, Religious, Sea Shanties, Christian Hymns, Spirituals, Barbershop, Choral, Gregorian Chant, and Boogie Woogie also fail to load.

Loading artist IDs from ~/jukebox/jukebox/data/ids/v3_artist_ids.txt
Loading artist IDs from ~/jukebox/jukebox/data/ids/v3_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:786432
Downloading from gce
Restored from ~/.cache/jukebox-assets/models/1b_lyrics/prior_level_2.pth.tar
0: Loading prior in eval mode
Traceback (most recent call last):
  File "jukebox/sample.py", line 366, in <module>
    fire.Fire(run)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "~/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 363, in run
    save_samples(model, device, hps, sample_hps)
  File "jukebox/sample.py", line 327, in save_samples
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "jukebox/sample.py", line 327, in <listcomp>
    labels = [prior.labeller.get_batch_labels(metas, 'cuda') for prior in priors]
  File "~/jukebox/jukebox/data/labels.py", line 60, in get_batch_labels
    label = self.get_label(**meta)
  File "~/jukebox/jukebox/data/labels.py", line 33, in get_label
    genre_ids = self.ag_processor.get_genre_ids(genre)
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in get_genre_ids
    return [self.genre_ids[word] for word in genres]
  File "~/jukebox/jukebox/data/artist_genre_processor.py", line 53, in <listcomp>
    return [self.genre_ids[word] for word in genres]
KeyError: 'sea'

mcleavey commented 4 years ago

Thanks. Looks like this is historical that we had trained 1B and 5B separately with different genres, but in the merge, the 1B is using the 5B's genres for the upsamplers. I'll adjust so the upsamplers won't complain if they see surprising genre words.

kcrosley-leisurelabs commented 4 years ago

@mcleavey, is there a related issue here with the colab notebook? When I use the colab notebook to load 5b_lyrics and then specify a genre that exists in VERSION 3 (v3_genre_ids.txt), but not in the version 2 (v2_genre_ids.txt), the cell where you specify your metas throws an error.

For example, if you try:

metas = [dict(artist = "barry white",
            genre = "coldwave",
            total_length = hps.sample_length,
            offset = 0,
            lyrics = """Some lyrics.
            """,
            ),
          ] * hps.n_samples
labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]

This will throw a Key Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-e02795cb531e> in <module>()
     16             ),
     17           ] * hps.n_samples
---> 18 labels = [None, None, top_prior.labeller.get_batch_labels(metas, 'cuda')]

3 frames
/usr/local/lib/python3.6/dist-packages/jukebox/data/artist_genre_processor.py in <listcomp>(.0)
     51             # In v2, we convert genre into a bag of words
     52             genres = norm(genre).split("_")
---> 53         return [self.genre_ids[word] for word in genres]
     54 
     55     # get_artist/genre throw error if we ask for non-present values

KeyError: 'coldwave'

mcleavey commented 4 years ago

@kcrosley-leisurelabs Yes, the 5B model was trained with the v2 genres (historically, the 5B-without-lyrics came first so was v2, and then we branched out to experiment with a 1B model with lyrics, which became v3). I'm wrapped up with other work this afternoon, but will update names/comments to make this more clear & intuitive.

kcrosley-leisurelabs commented 4 years ago

@mcleavey thanks so much for the clarification.

kcrosley-leisurelabs commented 4 years ago

So, I'm still kind of confused about this. The 1B model is smaller but has larger numbers of genres and artists? (Can that really be true?)

I notice that the latest commit now complains if one specifies a V3 artist when using 5b_lyrics whereas it didn't before - it notes that the artist will be mapped to "unknown" (again, this occurs in the colab notebook -- BTW, the notebook shared by @SMarioMan in https://github.com/openai/jukebox/issues/40 is vastly superior to the one in the current distro as it uses Google drive to store generated samples rather than volatile session storage and also demonstrates how to prime the model).

Final question: Before the latest updates, I'd been able to specify artists from V3 list with the 5b_lyrics model and it didn't throw any errors or warnings. Under the hood, was this simply silently mapping them to "unknown" in previous builds?

(Sorry for what might be derpy questions. I'm pretty novice with the AI rocket surgery stuff. ;) )

openai / jukebox

v3 Genre ID "jazz fusion" is non-functional #31