rOpenGov / psData

An R package to download regularly maintained political science data sets and make commonly used, but infrequently updated variables based on this data.
https://ropengov.github.io/psData/
45 stars 10 forks source link

Summer Hackathon #12

Open christophergandrud opened 10 years ago

christophergandrud commented 10 years ago

Sorry everyone, I've been kind of overwhelmed with other projects the past couple of weeks.

I was thinking that to get this thing started up again it might be good for interested parties to think of a few days to a week in the summer that they would be available to work together on this close to full time.

Any interest? Preferred times?

antagomir commented 10 years ago

I may be coming to Berlin OKFest. Will decide by June. If yes, that could be ideal time. Otherwise I am not sure if I have a chance to contribute on this particular project as the hands are full with running rOpenGov infra issues.

briatte commented 10 years ago

I'll be in Berlin for OKFest, so this might work very nicely :+1:

christophergandrud commented 10 years ago

That's great. I'll be in Berlin as well. So we should definitely meet up to cover some ground on this.

briatte commented 10 years ago

Dear all,

OKFest 2014 Berlin (July 15-17) has published its provisional programme, and I'm thinking of buying train tickets as soon as possible to avoid the typical high fares on France-German lines.

May I ask who will be in Berlin this July, and when?

christophergandrud commented 10 years ago

I'll be in Berlin from 13 July. I'm clearing time in my schedule to work on it that week.

christophergandrud commented 10 years ago

I'm thinking of submitting a talk to the CSVConf which is a fringe event of the OKFest.

Any thoughts?

antagomir commented 10 years ago

I will inform you soon about my possible participation, seems likely and would be great to meet.

I need to focus on pending issues with rOpenGov but we can hack together and strengthen the connections across these activities.

CSVconf talk is an option, too.

briatte commented 10 years ago

I agree, CSVconf sounds like a cool event.

christophergandrud commented 10 years ago

Great.

I quickly put together a talk description based on the original blog post I made introducing the package (It's supposed to be about a paragraph). Any comments are of course very welcome (especially for suggestions to slim it down). The deadline for submissions is 31 May:


Improving access to panel-series political science data with psData

There are many commonly used, electronically available panel-series data sets in political science. However, downloading, cleaning, and merging them together is time consuming. For example, accessing and combining Reinhart and Rogoff's fiscal costs of financial crisis data, involves downloading, cleaning, and merging 4 Excel files with over 70 individual sheets, one for each country’s data.

Researchers also regularly use variables that are combinations and/or transformations of indicators in regularly updated data sets, but which themselves aren’t regularly updated. For example, Bueno de Mesquita et al. (2003) devised two variables that they called the ‘winset’ and the ‘selectorate’. These are basically specific combinations of data in two other regularly maintained data sets. However, the winset and selectorate variables haven’t been updated alongside updates to the underlying data.

In this talk we introduce the psData R package developed under rOpenGov to solve two problems:

  1. Time wasted by political scientists (and their RAs) downloading, cleaning, and transforming commonly used data sets for their own research.
  2. Errors introduced each time custom data importation/transformation scripts are written to do what are in fact routine tasks across the community.

The psData package aims to address these problems by distributing easy to use R functions for downloading, cleaning, and merging political science panel-series data. The package is hosted on GitHub and can be easily added to and modified by the community. When an error is found in a data importation/transformation function it can be fixed and the patch distributed to all users simultaneously, improving data quality across the entire community with minimal extra effort.

briatte commented 10 years ago

Hello,

Here's a remix of the abstract. I have removed the selectorate/winset example to trim it down, and tried to formulate the project in the most possible generic terms.

christophergandrud commented 10 years ago

I like it a lot!

If everyone else is ok with it, all we need are the number of rOpenGov OKFest attendees and we should be set.

antagomir commented 10 years ago

Hi, yes cool ! I think we can only give estimates of participant number so far, and I guess the extact numbers are not necessary so how about this for the concluding sentence (feel free to modify in any way you see fit): "The team behind the rOpenGov/psData project currently includes contributors from universities in three [Finland, Germany, ...?] countries, and will be present in Berlin at the time of the conference."

christophergandrud commented 10 years ago

From the Github members list it looks like there are at least people based in: Finland, Germany, Denmark, US. @briatte where are you based these days?

leeper commented 10 years ago

This will be during my summer holiday and I'll be travelling. Wish I could be there, though.

On Thu, May 29, 2014 at 12:16 PM, Christopher Gandrud < notifications@github.com> wrote:

From the Github members list it looks like there are at least people based in: Finland, Germany, Denmark, US. @briatte https://github.com/briattewhere are you based these days?

— Reply to this email directly or view it on GitHubhttps://github.com/rOpenGov/psData/issues/12#issuecomment-44517975 .

briatte commented 10 years ago

I'm based in France (Paris and Lille). This project is almost eligible for EU funding :) On May 29, 2014 1:12 PM, "Thomas J. Leeper" notifications@github.com wrote:

This will be during my summer holiday and I'll be travelling. Wish I could be there, though.

On Thu, May 29, 2014 at 12:16 PM, Christopher Gandrud < notifications@github.com> wrote:

From the Github members list it looks like there are at least people based in: Finland, Germany, Denmark, US. @briatte https://github.com/briattewhere are you based these days?

— Reply to this email directly or view it on GitHub< https://github.com/rOpenGov/psData/issues/12#issuecomment-44517975> .

— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/psData/issues/12#issuecomment-44521745.

christophergandrud commented 10 years ago

Great, how about:

The team behind the rOpenGov/psData project currently includes contributors from universities in five countries, and many will be present in Berlin at the time of the conference.

christophergandrud commented 10 years ago

Sorry we'll miss you @leeper. Have good travels!

christophergandrud commented 10 years ago

Just submitted the talk idea.

Fingers crossed!

antagomir commented 10 years ago

Yess, good luck!

antagomir commented 10 years ago

@christophergandrud I will come to Berlin, and it seems also some other Finnish rOpenGov contributors. We should certainly meet and discuss all ideas - see you soon!

christophergandrud commented 10 years ago

@antagomir Great! I'm all signed up for the conference

Maybe we all should set a time to meet up. When would be good for you guys?

jlehtoma commented 10 years ago

Great! I'm only 90% certain that I'll be coming, so I didn't submit anything. csv,conf does look good and I'll probably attend that, but otherwise I haven't checked to programme in detail yet. I haven't booked any flights yet either, so plus-minus 1 day from the conference might be doable as well.

antagomir commented 10 years ago

I'm also there most certainly on July 15-17, possibly +/- day. During the conference most times are fine I think, I did not really check the program yet. Perhaps we should meet already on the first conference day July 15 (over lunch perhaps?), then we can continue during the conference when more discussion topics pop up?

briatte commented 10 years ago

Hi all,

I'll be in Berlin from July 15 to July 20, with a French mobile number that should work for texting. It will be a true pleasure to meet you all and get back to work on psData.

screen shot 2014-06-11 at 10 18 41 am

@antagomir unfortunately, my flight won't get me in central Berlin before 3-4pm on July 15th. Any chance we could meet slightly later that day? I land in TXL round 2.30pm, can be near Alexanderplatz one hour later.

See you there!

christophergandrud commented 10 years ago

I'm fine meeting later on the 15 if that works better. I live in the general area of the conference/Alexanderplatz so am pretty flexible.

christophergandrud commented 10 years ago

Hi everyone

We got the CSVconf proposal accepted! The presentation is on the 15th. I'm happy to give it/coordinate with anyone interested.

antagomir commented 10 years ago

Congratz! If you wish comments to a draft or anything just send a msg.

christophergandrud commented 10 years ago

Great. I think I'll just do a slide deck laying out our motivation, goals, and what we've done so far.

briatte commented 10 years ago

Great news!

Unfortunately, my plane probably lands too late for me to attend on the 15th.

Shall we pick up a time and place to all meet on that day?

jlehtoma commented 10 years ago

I'm afraid I'll have to skip the CSV,conf after all. I just booked my flights and I'll land in Berlin 19:35 on the 15th. I'm guessing I could make it to around Alexanderplatz ~21:00 which is pretty late. Let me know if you're up for a latish dinner/beer etc. I'll be flying off afternoon of Friday the 18th, so I'm free that morning as well.

briatte commented 10 years ago

I can probably do beer or food on the 15th round 9pm, although I don't know what my friends from the Open Knowledge Foundation are planning that night.

Perhaps meeting during one of the OKF Festival sessions would be easier for everyone? During the keynotes on the day after, July 16th, 9am–11am?

antagomir commented 10 years ago

I agree it might be easiest to meet during the conference, either during the sessions and/or over a lunch break. The Wednesday July 16th would be fine, but how about 10am..? I would not mind skipping the first workshops after the keynotes either if we have a good conversation. Anyway, it seems most times during the conference are ok with me, very flexible, I will be there on July 15 - July 17.

My personal priority would be on advancing the general rOpenGov network which involves many R package projects (and perhaps other languages later on). This has clear connections to the psData package that @christophergandrud is working on. Some people behind the related Bioconductor project have in fact said that developing standard data structures was one of the key driving forces in algorithm development in biosciences, so this is clearly very important for us, too. So perhaps we could consider the issue of data formats for open data more generally ? We can even consider having a small hackathon to advance our overlapping projects (rOpenGov/psData/other?) together.

What are the topics we all wish to discuss together - shall we try to make an agenda for discussion topics, or just have a free meetup and see where it will take us?

briatte commented 10 years ago

Meeting on Wednesday round 10am works for me.

And we will need more on what the Bioconductor team is saying about data structures if we push the psData project to publication status, this is clearly something that social scientists should be pay attention to in their routine work, not just when Reinhart and Rogoff mess up with Excel.

christophergandrud commented 10 years ago

Wednesday at 10 works for me also.

I'm going to work on the short talk for csvConf the week before. That will be a good opportunity for me to think through some of the topics to discuss.

antagomir commented 10 years ago

Great, let's fix it to Wednesday at 10am then. See you soon!

By the way if some one is interested, there will be a networking event organized on Tuesday in Finnish German Institute at 2-5pm. You are welcome but requires registration in advance: http://fi.okfn.org/2014/06/26/networking-event-at-the-finnish-institute/

christophergandrud commented 10 years ago

Great. looking forward to it.

On Monday, June 30, 2014, Leo Lahti notifications@github.com wrote:

Great, let's fix it to Wednesday at 10am then. See you soon!

— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/psData/issues/12#issuecomment-47536736.

christophergandrud commented 10 years ago

Hi everyone

Got some great feedback at csv,conf today. We can talk about it in more detail tomorrow at 10.

Oh, where do we want to meet?

antagomir commented 10 years ago

Perhaps in front of lobby desk? I haven't visited the venue yet, other suggestions welcome if someone knows the place.. On Jul 15, 2014 2:36 PM, "Christopher Gandrud" notifications@github.com wrote:

Hi everyone

Got some great feedback at csv,conf today. We can talk about it in more detail tomorrow at 10.

Oh, where do we want to meet?

— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/psData/issues/12#issuecomment-49031888.

christophergandrud commented 10 years ago

Yeah. Probably the lobby wherever the registration table is would be good. On 15 Jul 2014 15:04, "Leo Lahti" notifications@github.com wrote:

Perhaps in front of lobby desk? I haven't visited the venue yet, other suggestions welcome if someone knows the place.. On Jul 15, 2014 2:36 PM, "Christopher Gandrud" notifications@github.com wrote:

Hi everyone

Got some great feedback at csv,conf today. We can talk about it in more detail tomorrow at 10.

Oh, where do we want to meet?

— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/psData/issues/12#issuecomment-49031888.

— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/psData/issues/12#issuecomment-49035793.

jlehtoma commented 10 years ago

Good to hear about the feedback! I haven't been to the place either, but lobby/registration desk sounds good.

briatte commented 10 years ago

Lobby /registration tomorrow at 10am works for me. I'm in a nice flat just next door to the festival and my phone is +33 (0)6 43 -eight-6 64 0-eight-, if that might help at any stage.

christophergandrud commented 10 years ago

I'm stuck in the keynote. Can we meet at 10:30? On 15 Jul 2014 16:07, "François" notifications@github.com wrote:

Lobby /registration tomorrow at 10am works for me.

I'm in a nice flat just next door to the festival and my phone is +33 (0)6 43 -eight-6 64 0-eight-, if that might help at any stage. I'll be on site in five minutes :)

— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/psData/issues/12#issuecomment-49045543.

briatte commented 10 years ago

Yes, we're at the coffee shop, just below the 'Fassbier' sign :)

antagomir commented 10 years ago

Hi all, a related quote from the BioC developer R. Gentleman: "For Bioconductor, which provides tools in R for analyzing genomic data, interoperability was essential to its success. We defined a handful of data structures that we expected people to use. For instance, if everybody puts their gene expression data into the same kind of box, it doesn't matter how the data came about, but that box is the same and can be used by analytic tools. Really, I think it's data structures that drive interoperability." In: The anatomy of successful computational biology software http://www.nature.com/nbt/journal/v31/n10/full/nbt.2721.html

briatte commented 10 years ago

Quick notes from today's Berlin meeting on scaling up psData:

And also:

Next meeting Thursday noon under the Fassbier sign :)

antagomir commented 10 years ago

Hi both, we again agreed a meeting at 12 today (Thu). Now I notice there's a session I would like to attend then. This is followed by other interesting session. Could we meet later, like 16:30?

antagomir commented 10 years ago

No, in fact I can come at 12, see you thr. I just need to go before 12:45

antagomir commented 10 years ago

@briatte Hi we are in the Cafe with @christophergandrud so join us here!

briatte commented 10 years ago

@antagomir thanks :)

The SDMX stuff I mentioned today is there.

antagomir commented 10 years ago

Thanks !