queenvictoria / represent

Other
1 stars 2 forks source link

Keep log files for getter #10

Open queenvictoria opened 12 years ago

queenvictoria commented 12 years ago

As getter finds files it would be handy to store the URL in a flat file for later. Either for retrieval by other mechanisms or reference.

It might also be useful to build a record of the links returned by our searches as each XML file contains the original record of many of the individual search results. We might need to link these up at some point to provide references.

Its beginning to feel like we are duplicating a lot of the work of OpenAustralia.

curioushuman commented 12 years ago

Some of which we are. However, not all, and then we'll be adding to it.

Are you saying at this point, without the time constraints of the weekend, it would be better to go back and attempt to install the OpenAustralia project and build on top of it?

queenvictoria commented 12 years ago

I think openaus might have too much overhead. Also I don't speak ruby. But it does have a chef recipe. Anyone feel like running up a vagrant or slice somewhere? On Jun 6, 2012 10:02 AM, "Mike Kelly" < reply@reply.github.com> wrote:

Some of which we are. However, not all, and then we'll be adding to it.

Are you saying at this point, without the time constraints of the weekend, it would be better to go back and attempt to install the OpenAustralia project and build on top of it?


Reply to this email directly or view it on GitHub: https://github.com/queenvictoria/represent/issues/10#issuecomment-6140108

curioushuman commented 12 years ago

When you sent the email I was half way to having it installed on my local vagrant. In hindsight I have come to realise that may have been a bad idea but... Never mind now.

So far today I have :

  1. Obtained Represent! Hansard data in about 45 minutes
  2. Spent the rest of the day messing with OpenAustralia

The results of which are (no. 2 only) :

So... end result being that :

  1. It would seem that we are replicating some of OpenAustralia functionality
    • And are a little bit behind in functionality
    • The code is old, and their doesn't seem to be much movement to update it
    • Member links hasn't been updated in 2 years
    • And the rest isn't much better

In summary - it is gutting that the OpenAustralia stuff isn't more useful. It would be ideal to just pick it up, add to the database, add our own code around it (if necessary i.e. we can't pick up Ruby) and then implement our own API and front end. At this time I do not think this is possible.

I'll dump the DB in DropBox for your perusal as I haven't time to restart the vagrant process - apologies.

Did you guys draw a data model out at any point? On Wed, Jun 6, 2012 at 11:59 AM, Mr Snow < reply@reply.github.com

wrote:

I think openaus might have too much overhead. Also I don't speak ruby. But it does have a chef recipe. Anyone feel like running up a vagrant or slice somewhere? On Jun 6, 2012 10:02 AM, "Mike Kelly" < reply@reply.github.com> wrote:

Some of which we are. However, not all, and then we'll be adding to it.

Are you saying at this point, without the time constraints of the weekend, it would be better to go back and attempt to install the OpenAustralia project and build on top of it?


Reply to this email directly or view it on GitHub:

https://github.com/queenvictoria/represent/issues/10#issuecomment-6140108


Reply to this email directly or view it on GitHub: https://github.com/queenvictoria/represent/issues/10#issuecomment-6141504

Mike Kelly Internet Consultant e » mike@whatsthatweb.com.au w » whatsthatweb.com.au m » +61 (0) 450 433 973

queenvictoria commented 12 years ago

i'll look at the db .

yes our models are in a spreadsheet in google docs . you can see most of it in the python too . all the classes that inherit Base are models that enter the db . https://docs.google.com/a/houseoflaudanum.com/spreadsheet/ccc?key=0ArFMXLynDwEzdFlORTE1XzFhMVNuNjZRQThEY3N4SlE Bill https://github.com/queenvictoria/represent/blob/master/hansard-parser/main.py#L190 Division https://github.com/queenvictoria/represent/blob/master/hansard-parser/main.py#L223 Vote https://github.com/queenvictoria/represent/blob/master/hansard-parser/main.py#L270 Speaker ( we should rename to Member ) https://github.com/queenvictoria/represent/blob/master/hansard-parser/main.py#L328 Speech https://github.com/queenvictoria/represent/blob/master/hansard-parser/main.py#L366

our importer is complete for debates past 10 May 2011 i think . one thing we could do is match their schema ( though there was a fair amount of debate around that ) .

nsewell has forked ( as have others ) OA to fix some of the errors . maybe we check in with whats going on there . https://github.com/openaustralia/openaustralia-parser/network

On 06/06/2012, at 2:43 PM, Mike Kelly wrote:

When you sent the email I was half way to having it installed on my local vagrant. In hindsight I have come to realise that may have been a bad idea but... Never mind now.

So far today I have :

  1. Obtained Represent! Hansard data in about 45 minutes
  2. Spent the rest of the day messing with OpenAustralia

The results of which are (no. 2 only) :

  • Win - Member data (via parse-members.rb)
  • Fail - Member link data (via parse-member-links.rb)
    • It would seem the URL's they have for aph.gov.au are old
  • Fail - Member images (via member-images.rb)
    • Could be same problem, I haven't delved in as yet
  • Win / fail - Speeches Hansard speeches (via member-speeches.rb)
    • Hmmm I got some data, I don't know useful it is
    • Also didn't like being passed dates that included no speeches

So... end result being that :

  1. It would seem that we are replicating some of OpenAustralia functionality
 - And are a little bit behind in functionality
 2. The code is old, and their doesn't seem to be much movement to

update it

  • Member links hasn't been updated in 2 years
  • And the rest isn't much better

In summary - it is gutting that the OpenAustralia stuff isn't more useful. It would be ideal to just pick it up, add to the database, add our own code around it (if necessary i.e. we can't pick up Ruby) and then implement our own API and front end. At this time I do not think this is possible.

I'll dump the DB in DropBox for your perusal as I haven't time to restart the vagrant process - apologies.

Did you guys draw a data model out at any point? On Wed, Jun 6, 2012 at 11:59 AM, Mr Snow < reply@reply.github.com

wrote:

I think openaus might have too much overhead. Also I don't speak ruby. But it does have a chef recipe. Anyone feel like running up a vagrant or slice somewhere? On Jun 6, 2012 10:02 AM, "Mike Kelly" < reply@reply.github.com> wrote:

Some of which we are. However, not all, and then we'll be adding to it.

Are you saying at this point, without the time constraints of the weekend, it would be better to go back and attempt to install the OpenAustralia project and build on top of it?


Reply to this email directly or view it on GitHub:

https://github.com/queenvictoria/represent/issues/10#issuecomment-6140108


Reply to this email directly or view it on GitHub: https://github.com/queenvictoria/represent/issues/10#issuecomment-6141504

Mike Kelly Internet Consultant e » mike@whatsthatweb.com.au w » whatsthatweb.com.au m » +61 (0) 450 433 973


Reply to this email directly or view it on GitHub: https://github.com/queenvictoria/represent/issues/10#issuecomment-6142914

curioushuman commented 12 years ago

Thanks for the heads up RE model - I should have guessed that's where you put it.

I think our need is based more around recent debates so am happy to deprioritise older schema formats in favour of sentiment, topic and other such items.

How many hours remain do you think? Is this something I could dive in on? Or the API, or continue the front end?

queenvictoria commented 12 years ago

There are only a couple of things missing from the import--notably bill sponsors. I scraped and imported 10/5/2011 to present successfully I think. There are some todo's mentioned in the code.

Then its on to exposing JSON endpoint. My first thought is to move the models out to models.py and reuse them. Pete mentioned a router to handle the incoming requests but I can't remember what it was. Lets get an issue up around that and continue on. I can help out if you have a quick rekkie on the right way to do this.

curioushuman commented 12 years ago

Also, the bug fixes the others have done look important but I don't think it solves all of the issues. We'd still have to do some Ruby.