okfn-brasil / serenata-de-amor

🕵 Artificial Intelligence for social control of public administration | **This repository does not receive frequent updates. Check out the README**
https://serenata.ai/en
MIT License
4.52k stars 661 forks source link

Get names of immediate relatives of each deputy and senator #15

Open Irio opened 8 years ago

Irio commented 8 years ago

Super useful for detecting different forms of nepotism. Better if we can get until third degree.

jvsl commented 8 years ago

@Irio, Where Can I get those information?

cuducos commented 8 years ago

Maybe Wikipedia, maybe Facebook… we have to be creative on this point…

Irio commented 8 years ago

A good start is collecting who politicians declare on their own Facebook profiles.

Let's take Renato Molling, the RS federal deputy who spent more from the Quota for Exercising Parliamentary Activity last year. He has a personal Facebook profile and lists two people as his family members, Larissa Molling and Vinicius Molling. In this specific case, listed relatives links to no Facebook profile directly, but searching for their names bring them as top results.

I believe Facebook API has ways of searching for people by their names and also returns the list of family members from the profile.

P.S. Even if this method brings just a short list of politicians and relatives, we can give a high level of trust on this information, since it's auto declared.

@jvsl

jvsl commented 8 years ago

@Irio @cuducos , I was looking for a free service like this: https://www.myheritage.com.br/, but I didn't find. I think wikipedia and facebook are a good start as cuducos said.

cuducos commented 8 years ago

Kudos for that @irio:

P.S. Even if this method brings just a short list of politicians and relatives, we can give a high level of trust on this information, since it's auto declared.

jvsl commented 8 years ago

I've been trying to use graph API to get family from deputy and senator profiles, but I didn't have success. The API requires access token to get these information. The user token is obtained log in on facebook. Thus, I'm just allowed to get my own information from facebook. So, I'll try to use wikipedia.

jvsl commented 8 years ago

If someone wants to try using facebook, this link will help: https://developers.facebook.com/tools/explorer/

cuducos commented 8 years ago

I've been trying to use graph API to get family from deputy and senator profiles, but I didn't have success.

I'm sorry to hear that, @jvsl — we already knew that, and it was documented inthe README. I don't want to sound like a dick, or even like mommy saying I've told you, but is there a way to make it clearer so people start off looking for alternatives?

Also I want to apologize for being less clear than I could have been. When I said earlier in this topic maybe Facebook I was assuming that Graph API wouldn't work but we could try to get some Facebook data (for example, using PhantomJS, Selenium and reaching URLs like https://www.facebook.com/USERNAMEj/about?section=relationship).

And surely using Wikipedia is a good idea too! This message was just to say I'm sorry, and to say I'm happy for your support and enthusiasm ; )

jvsl commented 8 years ago

No worries. It was just a communication failure. I just think you all could have a better controll of "who is doing what". Some tool like a trello or something like that. It's just a suggestion. I have interest in keep doing that task but I don't know if someone already get it. Do you know what I mean? :)

cuducos commented 8 years ago

@jvsl We are following what we see in many open-source communities here on GitHub. A comment in an issue saying I got it is enough. People interested in the topic of the issue usually read the issue thread and can get an overview of what's going on on the design and execution ; )

jvsl commented 8 years ago

Ok, I got it. I imagined that.

janosimas commented 8 years ago

Maybe include second and third level relatives? Use a grade of proximity? ex: A son-in-law may not be a direct reference, but could be the daughter's husband.

allantorres commented 8 years ago

Here you can get name of mother and father ! Is the begin ... maybe top down to see brothers and sisters then you can start with this. http://www2.camara.leg.br/deputados/pesquisa/layouts_deputados_biografia?pk=73481

allantorres commented 8 years ago

And you can try make something in Wikipedia like this page , you have the informatino but it is not easy to catch : https://pt.wikipedia.org/wiki/Fernando_Marroni

jvsl commented 8 years ago

Yes, I got relatives from wikipedia and I've got good results. I'm about to finish the script. :)

allantorres commented 8 years ago

Will be great if we could see if there is any relatives working in other deputy or senator staff. We can see information about nepotism.

Irio commented 8 years ago

@allantorres These 2 issues raise these ideas. https://github.com/datasciencebr/serenata-de-amor/issues/17 https://github.com/datasciencebr/serenata-de-amor/issues/18 Willing to help?

augusto-herrmann commented 8 years ago

Here are some other possible sources for this data.

Text mine from news that mention family members of politicians Most family members won't have been mentioned in any bit of news. However, among those that have, it is more likely to have been about some previous suspicion of corruption, and that makes it all the more important for them to be considered for analysis.

For an example of such information out in the open, open Google's news search, type in the name of a politician followed by the word "filha" (daughter) and, if she's been mentioned in the news, you're likely to find the name of the politician's daughter.

Check the dataset of campaign donors The Superior Electoral Court of Brazil releases data about the campaign donors of each candidate. Starting from the 2016 local election, only natural persons can contribute for campaigns. Close relatives are more likely to contribute more money to the politician's campaign. This correlation could be explored by cross-referenced with other sources of data to find which ones are relatives.

wfzyx commented 8 years ago

Hey,

I don't know if I'm bit late but, an valid possibility is to make an "Relative Index" using Six degrees of separation theory.

So all parliamentarians would have a 0-factor, direct relatives and past campaign donators 1-factor, relative to 1-factor people get a 2-factor and so on.

This can be a efficient way to build a heat-map on potential scapegoats circling the parliamentary core.

For who aren't familiar with the concept the Kevin Bacon game is a good example of this concept in action.

anaschwendler commented 7 years ago

me and @braunmagrin will use congress person biography from câmara getting congress person filiation and will export to an csv in the following configuration: congressperson_id,relationship,relative_name

jonasporto commented 7 years ago

Maybe my comment help with this

jvsl commented 7 years ago

Unfortunately, I have not had the time I would like to dedicate myself to the project. But I got to develop a script that gets relatives of senators through the wikipedia page on google and own wikipedia page. The script needs improvement and can be extended to get relatives of deputies as well. At least 30 senators have no family information in the wikipedia page. Thus, it was possible to generate a json with the information of the family of 51 senators.

Here's the link: https://github.com/jvsl/script-get-relatives-from-wikipedia

jvsl commented 7 years ago

the script needs improvements to remove logic repeated and apply some good practices.

talespaiva commented 7 years ago

There's also the DBpedia project, which maps the Wikipedia infobox to an ontology and provides a queriable endpoint. This avoids some pitfalls of web scrapping and is based on a graph data model.

For instance, this query:

select ?property ?object
where {
 <http://dbpedia.org/resource/Tasso_Jereissati> ?property ?object .
}

returns all facts avaiable about Tasso Jereissati. A more specific query would be:

select distinct ?conjuge
where {
 <http://dbpedia.org/resource/Tasso_Jereissati> <http://dbpedia.org/property/conjuge> ?conjuge .
}

You can try them here: http://pt.dbpedia.org/sparql

To do this programatically in Ptyhon, you can try rdflib and SPARQLWrapper.

The drawback of this approach is that the endpoint is not live sync-ed with Wikipedia database. So it depends on periodic data dumps. I personally think that Linked Data technologies (RDF, SPARQL, etc) can be very helpful in the project.

augusto-herrmann commented 7 years ago

It is possible to retrieve all name pairs of politicians and their spouses from DBPedia directly by using SPARQL, like this:

select distinct ?nome_politico ?nome_conjuje where
{
    ?politico a dbpedia-owl:Politician .
    ?politico rdfs:label ?nome_politico .
    ?politico dbpprop:conjuge ?conjuje .
    ?conjuje dbpprop:nome ?nome_conjuje .
}

Unfortunately, the Portuguese DBPedia returns only 12 pairs for that query.

talespaiva commented 7 years ago

Indeed, the DBpedia datasets are not complete. I managed to get a little more results with these two queries:

The first one in the portuguese DBPedia [http://pt.dbpedia.org/sparql]:

select distinct *
where {
 ?person dbpprop:tĂ­tulo ?title .
 ?person dbpprop:conjuge ?spouse .
 FILTER regex(?title, "Senador|Deputad", "g")
}

and this one in the English DBPedia [http://dbpedia.org/sparql](only for senators):

select *
    where {
     ?person dct:subject dbc:Members_of_the_Federal_Senate .
     OPTIONAL { ?person dbp:spouse ?spouse . }
     OPTIONAL { ?person dbp:children ?children . }
    }

I don't have much time now, but it's a matter of exploring the ontologies to discover the relationships.

braunmagrin commented 7 years ago

@anaschwendler and I created the script in the PR #93. It'll get the names of the parents of the congresspeople. Unfortunately the information is not complete, only approx. 800 out of 1150 have it. Also, it's only their parent's names, I guess we still need to find a way to get more degrees of relationship.

anaschwendler commented 7 years ago

@braunmagrin remember that we're having troubles with mother and father's name? @turicas has sent me a link to help us in this task: https://genderize.io/ We decided that we'll keep all info as parent, but it could be useful in the future :)

lucasa commented 7 years ago

Hi! Maybe this could help: http://facebook4j.github.io/en/api-support.html But let's start collecting all profiles of the politics and later looking for theirs family connections.

We already have this facebook profile field?

cuducos commented 7 years ago

@lucasa have you tried it for real? As I mentioned earlier Facebook API just share user data among users that are using the same Facebook app (I mean, we'll have an app, an API key for this app and we'll only have access to users that sign in to our app).

cuducos commented 7 years ago

PR #93 partially covers this issue, but we believe there's more data to collect. It collects data from the Lower House website, which is the name of the parents of some (not all) congresspeople. We still could complement the dataset for the missing names and/or collect other relative names (children, brothers and sisters etc.).

wisner23 commented 7 years ago

I'm trying to work on it

vmesel commented 7 years ago

If we have the CPF or more detailed data like name, address and cellphone, I may have some contact to help us finding this data. Don't know how precise the data is for addresses and cellphones, but relative names are very easy to find.

henrique2601 commented 7 years ago

I dont know if anyone still working on this, but I found this site where you can with CPF get the mother name trying to do a sign up. The problems is that have any captchas, I think anyone can "hack" this mec system to check if is possible use this api for get mother name. http://sistec.mec.gov.br/login/cadastrar

turicas commented 7 years ago

I've done a gender-based classification using IBGE API (data based on 2010 census). The scripts is available here: https://github.com/generonumero/logradouros

cuducos commented 6 years ago

Closed accidentally by unrelated commit from Rosie/Jarbas repos.

turicas commented 6 years ago

I've extracted the 100k more popular Brazilian names and classified them using IBGE Nomes API, also grouped by "name group" (like "Thiago" and "Tiago" are in the same group) - the dataset is available at Brasil.IO and may help in this issue.

willianpaixao commented 6 years ago

@Irio @cuducos how is the state of this issue? I see many related issues closed, some merged. Maybe someone can summarize what was done and what still needs to be implemented?

cuducos commented 6 years ago

I think that discussions like that are valid in the sense they might be the building block (together with #119 and #224, for example) for a new classifier. A rough roadmap would be:

DevLokCodes commented 4 years ago

Sorry if I am very very very late to the party, but I was thinking if we had to look into their social media presence, we could track when that account was created and what other accounts (high chances of them being relatives) where added, friended or joined with , I think its our tendency to add to join people from the family circle and closest people first (so there would be higher chances of finding relatives that way).

just saw the timeframe of other comments