veekaybee / swedish-house-ml

A project examining the relationship between nudity in cover art and social media response to music
6 stars 3 forks source link

Preliminary thoughts #1

Open jbn opened 6 years ago

jbn commented 6 years ago

I don't know anything about sound. I don't think it's terribly difficult to use something like a FFT or a RNN to measure sample similarity, but it's so far outside of my knowledge, I'd be uncomfortable with any results.

So, how do we measure quality?

I assume metrics like up-votes, down-votes, and plays as a function of time since upload would work well. Popularity is sigmoidal though, and we don't have access to the time-series to ascertain where in that cycle a song may be. If we have enough data, I think it should be fine, but I'm not sure.

I think the real advantage comes from artist cross sections. This mostly precludes time-matching I think though, because it's two songs uploaded at the same time probably have the same cover (Is that not true as much for sound cloud? I don't have user experience with it.)

The basic hypothesis would be that, on social platforms that present cover art with songs during play, covers with nudity would propagate further, as reflected in the social metrics. I'm not sure if it's causal though -- the artist may pair such imagery with their better works, because they want to highlight it over the others, which they think are less interesting.

(Also, I wish we had demographic information on plays to decompose the statistics. I'm sure there is a gender and age relationship that would be pretty interesting to quantify.)

As for measuring the image, DNN classifiers work well for this. Although, I'd like to research continuous classifiers instead. That is, rather than classifying "Sexualized" / "Not sexualized" I'd like an accurate scale to allow more information capture. I don't know how to do that for images. Using the prediction probabilities to the logistic output is a start, but I think that will end up optimizing on the wrong loss function

jbn commented 6 years ago

Actually, at least on YouTube, I know people repackage (i.e. pirate) EDM to inflate their channel views and steal revenue. It may offer a means of statistical control since we can't get experimental control. That is, this song has a/b up/down with this cover, but c/d with that one. Still hard because they are different subscribers and viewers, but just had that thought in the shower...

veekaybee commented 6 years ago

Let's define some of the basic questions we have and work up to a hypothesis, data sources, etc, from there:

  1. As you said: Do songs with nudity in the cover art get more plays, more likes, more social engagement than songs without nudity? <-- I think this is probably the main question
  2. Some of my questions: Is there a genre/subgenre of EDM that's more likely to post nude cover art? Why is EDM like that (might be more of a sociological investigation )
  3. Can we somehow relate the quality of a song to what its cover art is like and whether it has less or more nudity. (probably based on question 1. )

Would you agree that 1 is what we want to explore first?

jbn commented 6 years ago

I do agree that (1) is the best point of entry. Although, I also think it may get tangled with the sociological aspects of the service. Does SoundCloud propagate social engagement, such that your friends see a stream of likes? Or, does Youtube? Given nude art, that could inhibit one type of action, or cause favoritism for another. (I'm gonna look more into YouTube today and I'll start another thread.) Even if this is the case, according to this web page that links to a study I didn't even skim that I found while walking to WeWork, the demographics of EDM are 55/45 Men/Women. So, I think the effect we're hypothesizing should be visible even without being able to condition on gender, because there are more men. (But, of course, it would be really good to be able to condition on gender.)

The sociological question is really cool. Actually, I'm interested in that more! I hope we get there.

veekaybee commented 6 years ago

That's a great question: Does increased social engagement result in a chain of engagement that is not necessarily related to the actual art, but the amount of engagement that comes with it, and I'm not sure how we could measure that.

Here's what I see next steps as:

  1. Picking a social network we want to focus on
  2. Analyzing how that social network ecosystem works
  3. Gathering a collection of referential literature/blog posts around this
  4. scraping/downloading images from the API
  5. Running them through a classifier. The Cafe one you linked to looks pretty promising. I think we can load it on an AWS EC2 instance, even potentially (or if not, freeze an EMR instance that we can spin up and down) to see which ones have nudity
  6. Creating a predictive model to see if we can figure out what nudity or non-nudity impacts, or if songs with nudity in cover art are subjectively better based on both social metrics, and maybe comparison to Billboard charts?

I say we start by picking a social network, collecting (aka scraping or downloading via the API) a bunch of art and start seeing what kinds of things we can draw out from the network in terms of features:

  1. Social engagement factors (likes, shares, reposts)
  2. Song metadata (genre, title, artist)
  3. Song artwork
  4. Anything else?

and which network seems like it could work best.

So our three candidates right now are:

1. YouTube - I don't know that this will work because I have no idea how to pull artwork from a .mov file, particularly if it changes over the course of the song - we'd have to figure out how to figure out which "videos" are really stills. 2. SoundCloud - Seems to have an API with a Python library wrapper (that's pretty out of date)

  1. Spotify - Seems to also have an API with Python library wrapper that's a little more active, but also I feel like SoundCloud might have more "real" coverart in that they don't censor? I don't know. Could be another hypothesis to test.
veekaybee commented 6 years ago

So I think we should also now start a Trello for this: https://trello.com/b/HEiH4oRt/swedish-house-ml

jbn commented 6 years ago

(Quick reply before a lengthier one later.)

I think you know the networks better than I do. Shit, I'm still looking for a copy of Resonance's Exhale from like 2003! But, unless you have any strong objections, I'd prefer to use YouTube. Mostly, I have designs on doing propaganda analysis in the coming year, and while YouTube isn't exactly ground-zero, it's not too far away. I know they have a Python API, too. I haven't looked into it yet, but this cursory search suggests preferred covers are readily accessible.

As for computing resources, I have several cubic fuck-tons literally in my closet. (A lambda labs quad workstation -- heats my house.) My dissertation right now is mostly math-y / proof-y so just burning electrons mining ether. This would get priority.

veekaybee commented 6 years ago

You're right. Quick glance says it should be easy (famous last words.) I'll see what I can find from YouTube network research. We can also always look at the other ones later.

Picking a social network we want to focus on ---> Analyzing how that social network ecosystem works / Gathering a collection of referential literature/blog posts around this scraping/downloading images from the API Running them through a classifier. The Cafe one you linked to looks pretty promising. I think we can load it on an AWS EC2 instance, even potentially (or if not, freeze an EMR instance that we can spin up and down) to see which ones have nudity Creating a predictive model to see if we can figure out what nudity or non-nudity impacts, or if songs with nudity in cover art are subjectively better based on both social metrics, and maybe comparison to Billboard charts?