veekaybee / swedish-house-ml

A project examining the relationship between nudity in cover art and social media response to music
6 stars 3 forks source link

Opening this project back up! #6

Open veekaybee opened 5 years ago

veekaybee commented 5 years ago

@jbn I have some free time am opening this project back up, damnit! πŸ•Έ πŸ˜„ Feel free to respond at your own leisure.

Ok, so I was playing around with the data last night and had a couple of philosophical thoughts/questions:

Data Cleaning/Tagging

  1. When we get the data from YouTube, we get, as you may remember, something like this: Kygo - Remind Me To Forget (Lyrics / Lyric Video) ft. Miguel (0-CYcrmGBZU) where 0-CYcrmGBZU is the URL relative to https://img.youtube.com/vi/<url>/0.jpg

I'm thinking that we'll probably want to keep all the images saved as 0-CYcrmGBZU.jpeg then create a JSON metadata repo of the images, their associated metadata, and the videoId. So the JSON structure I'm envisioning is:

{"kiEGIQaR1QQ": 
{"hi_res_url": "https://i.ytimg.com/vi/kiEGIQaR1QQ/hqdefault.jpg", "video_title_and_artist": "Kygo - Think About You feat. Valerie Broussard (Cover Art) [Ultra Music]"})

We'll need to hit a separate API to get ratings data.

Model

I have $300 in free GCP credit that I was hoping to use before April and GCP runs Tensorflow, unless we want to spin up an instance and install Caffe on it. I did find a TF implementation of the Yahoo model you linked to that I think we can use.

Way down the line, I'm having a hard time envisioning how we go from rating something as NSFW to interpolating quality - are you thinking we'd just do some simple correlation analysis to see if there is any?

jbn commented 5 years ago

Ah shit! I can't wait!

For now, I also can't do anything :(. I've managed to get myself the maximally fucked PHD experience. My chair is retiring, and if I don't defend by May...

...I probably won't be able to finish at all.

But, soon for us!

On Sun, Feb 17, 2019 at 6:53 PM Vicki Boykis notifications@github.com wrote:

@jbn https://secure-web.cisco.com/1DyfSwkPOF8F_YuHQlsommSfFLKXUmugGmdMbkQLMADIIlb2dtKLLnazbMbZ0RVgvh7LsxoD9a-abdUbeN8HKTnZ1uFgQn0_oLk_WZVohCEWsTgrWkMeqRjIJpG3PWp0XNmBmESZ3zi5bsnrbIR1haJ80jgubWj00EpCL1kNjeM7y6azqDKudMNhtAL414fEQlPmeB34eaMGOMyHKQdB1g6UWvub1YlnHlHgdtsi4QobC301L_S64JlxBAfKw2MvY_Pup-DBPvxzbqGHys4Mn1UGGyy1HFo24kURYVx-wrBQyD0zoitpltUK1OpbEAu7_Tx1FlGA7CvfIBWLVtAQumds6eUtIh-JH9UwtmOrgcdY0bM8hHLhVpK24QN68WaSIY-SG7G0Mw5yNsUWW16jTxSqBoDzC60oOBtg4FFail2WXYfLLYwCjshS-wFIVlIaN/https%3A%2F%2Fgithub.com%2Fjbn I have some free time am opening this project back up, damnit! πŸ•Έ πŸ˜„ Feel free to respond at your own leisure.

Ok, so I was playing around with the data last night and had a couple of philosophical thoughts/questions:

Data Cleaning/Tagging

  1. When we get the data from YouTube, we get, as you may remember, something like this: Kygo - Remind Me To Forget (Lyrics / Lyric Video) ft. Miguel (0-CYcrmGBZU) where 0-CYcrmGBZU is the URL relative to https://img.youtube.com/vi/ /0.jpg

I'm thinking that we'll probably want to keep all the images saved as 0-CYcrmGBZU.jpeg then create a JSON metadata repo of the images, their associated metadata, and the videoId. So the JSON structure I'm envisioning is:

{"kiEGIQaR1QQ": {"hi_res_url": "https://i.ytimg.com/vi/kiEGIQaR1QQ/hqdefault.jpg", "video_title_and_artist": "Kygo - Think About You feat. Valerie Broussard (Cover Art) [Ultra Music]"})

We'll need to hit a separate API to get ratings data.

Model

I have $300 in free GCP credit that I was hoping to use before April and GCP runs Tensorflow, unless we want to spin up an instance and install Caffe on it. I did find a TF implementation of the Yahoo model you linked to that I think we can use. https://secure-web.cisco.com/1NFVEGCHK7n-iJMXBCqZ1TuZU8PlqYHjCru_puwRTqFWmvWpBNN8vhX_PyaAqThXyh3yBQOZnYPcQc9VSp9CZypJKZRRcQUVUmboo-Ji1OGPmFKzx9eTbkd1TmHgl_F8uIhkXYLwluj38tuwaqrrUoNg2GdjmsEw1btEE7hQMvXUVCvLfd5NV9EY1LtdADF9uEpqh6eIsJItQ9gSnpnGSd341XTTBKLLZ_VW3M_iqu-uJ4L3orK1Co_fM2PaowQJ3zAVXLICjsg9ROwxiCzwnGu9ZOzTaHepuegE1wk-SI8mbZgoJwNvRon44Uk9qdahvpUxCJ9iYhkgbC6-HYDKa5LbjTWk1SgkJ1hkeaOVx1xJj-e1RB5FhHsby7nwl6YF5bFg5gY6pYUCTETJ4TCDmFd9SUbgcoKvSgBUNddJyosmdcSqVHWOly64YjsuSg-_u/https%3A%2F%2Fgithub.com%2Fmdietrichstein%2Ftensorflow-open_nsfw

Way down the line, I'm having a hard time envisioning how we go from rating something as NSFW to interpolating quality - are you thinking we'd just do some simple correlation analysis to see if there is any?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://secure-web.cisco.com/1UwyZkS02strRl2bjShEt5tTYLAOgp1Idt1e3-MBCORi2nWBZKd3jZK0GS4PqPGj2-3y_B-ECO1-RkZjbhwywUQ_CA7U8wa4oM1n28byBW_7Glo3eTHvy4DSm1mjLlI-oQGo4ztJd23uKtbpV7CXpK8RI4o5MuRb1ZSrIFMw5ntBwZUZCRQm9p6RCE-9QlWIvVDGlNJGmUDPs_16YuIK33ylM-U3B--dHxQHuWOpWv4UNLj9g0og1IXMIWuk3HDRZucmJpKVrCxSgMyMVdNFC7vttNVWsKZhQufNl4M2Iy0zMuuOgNQuV6Nh5IqKiEbw_8d6B2YzC5oIzfC7KvpkPampZsQTbS8U2pXiCcMvm8d2MqgZp4byMCbnRAZ0hQhgNu0v_74_8vfTKqG_17s8myo7DPkH0WRAswur18g3U43KlPND3CiUBXV77X0g3-GwU/https%3A%2F%2Fgithub.com%2Fveekaybee%2Fswedish-house-ml%2Fissues%2F6, or mute the thread https://secure-web.cisco.com/1SfYDf47wTSadTpEuE4bS_3FgULgsSBMQ6rMjPQihB0u293A7sY0re0RKFoViu6jGKobMVCIWH3ruVMoBtauLiOqv-ggMe1GPRsnpbN1WY462zFnpQYefDhJe_otWsLJgk2QF8CN7XjQ95gqE6GMtFZ3jN2bmpnLtQ628Uvh1MJIli5Ywli4SDhO-_lAZItz7X0-Be5VQJJY94umeDmzG4gk9rE0t7_1lcpyvXYKB2AEhS7MwAGjLc7-syvTt_sDribvGYVVAxSmSmFamgFcjNoEKH7JIIDA4mPfyP-9Db33zDjEiqCpgVQvHzAS-rLatwNbTXj-sat0DPFEnEaLjtOo9jribYLaie41agGwUltzERGyOK9iMwunlJ7qrX95MPuic2P1ae_Ylh97_omLPmn_g9icEy2Cc215wnn_xurFgeWE-uRehO2JRmSFETNE7/https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAEz8xudZzwYTAiOfj2frkXa81XbhN8Jks5vOhWsgaJpZM4a__qt .