Open jeremybmerrill opened 6 years ago
This sounds cool, what kinds of data and metadata do you have? We do ML for social science at my lab (it's hard!)
Hi @yinleon, thanks for your interest! We have about 54,000 ads; you can download them here. That page has the schema too. The text content of the ads (message
) is probably the most predictive, but the targeting methods (parsed into targets
; raw from Facebook in targetings
) and any links in the raw html content of the ad (body
) might also be predictive.
We have an image from each ad (either the main image or a still from the video). We don't have any data extracted from the images, whether by image recognition, text OCR or anything like that. There's likely-predictive data in here: often listbuilding ads contain a "survey" (e.g. this one) that's not actually collecting any data other than email addresses.
The biggest problem is that we don't have a labeled subset for training. The dataset is unbalanced; it's mostly fundraising and listbuilding ads, with fewer persuasive and mobilization ads.
Would love to hear your thoughts! I'm always looking to hear from folks with more experience doing ML... Let me know if you have more questions about the dataset or about my ontology.
Just for recordkeeping, here's an example of a mobilization ad: https://projects.propublica.org/facebook-ads/ad/23842873784130638. Danny O'Connor, a Dem special election candidate for US House in OH-12 is asking a custom audience to check his list of changed precincts for the election.
political ads can have many different purposes, including
(I realize this is a somewhat simplified ontology. Ideas on how to come up with -- and operationalize -- a different ontology are totally welcome.)
It'd be amazing to come up with a machine learning model that could come up with a decent guess as to which category a given political ad falls into. You might be able to figure this out just from the text of the ad. (In a perfect world, we could also extract interesting features from the ad images/video, but that's out of scope.)
I can talk endlessly about this idea. Let me know if you're interested. Reply here or email me at jeremy dot merrill at propublica dot org.