jskripchuk / music-analysis

Teaching basic computer science principles and concepts through the lens of music. Hooktheory music analysis toolkit for MUSC106: Computational Thinking in Music.
10 stars 1 forks source link

Hooktheory Latent Statistics Toolkit (HoLST)

HoLST is a novice-friendly toolkit for completing corpus analysis of songs featured on Hooktheory, a user-contributed website where users transcribe the theoretical backbone of popular songs. This toolkit can read and parse Hooktheory's .hkt file format and extract meaningful information and statistics from the melodies and harmonies inside.

Motivation

The main motivation for HoLST was the creation of a new course at the University of Delaware - MUSC106: Computational Thinking in Music. At this point in time, the University recently received an NSF grant to implement computational thinking into general education courses. We wanted a piece of software that would easily parse data from Hooktheory (one of the largest music theory databases available right now) and produce relevant statistics.

As we quickly learned (as of date), Hooktheory's own API is seriously lacking in utility. It does not support the ability to search by artist, by song name, by key or mode, etc. The only tool provided is to search for songs with the same progression (and analysis of the entire Hooktheory corpus has been done many times before). This meant that our users had to download the .hkt files themselves, which unfortunately requires a Hooktheory Plus membership.

One of the main goals we had in mind was for the Toolkit to be friendly to beginning programmers, since the function of the course was to introduce students to the main concepts of programming and Computer Science through the lens of music theory.

Features

How to Use

Requires Markovify and Plotly, so please install those packages first.

Initialization

At the very core of HoLST is the "Corpus Analysis" object. If you feed it a directory with the Hooktheory .json files, it will perform most of the statistical modeling on the corpus and package the results into a nice object.

We will use the example folder full of Adele songs.

import analysis_model as analysis

obj = analysis.CorpusAnalysis("./adele")

.generate_progression()

HoLST creates a Markov Chain based off of the chord progressions in the corpus. This returns a string of chords in roman numeral notation that results from a walk through the Markov chain. The theory is that chord progressions generated by the Markov chain are (very rough) approximations of what typical chord progressions by the artist would be.

Markov chains are stochastic so you'll get different results each time. But if your artist is consistant you'll begin to notice patterns over larger sample sizes.

for i in range(0,5):
  print(obj.generate_progression())

"""
I V vi IV iii vi I I V vi viio V I I IV V vi IV iii vi vi I
i i III VII
I I IV iii vi I I IV V7/vi IV I V vi ii ii ii ii ii ii IV V I IV V I
i VI III VI III iv III VI
VII v VI VI VII
"""

Issues