hackersatbrown / api-morning-mail

2 stars 0 forks source link

Write the transformer from the MM feed to our desired output JSON #7

Closed jonahkagan closed 11 years ago

jonahkagan commented 11 years ago

Note this convo: https://github.com/hackersatbrown/api-morning-mail/pull/5#issuecomment-11409269

sumnerwarren commented 11 years ago

Do you have a suggestion for a date library in node? I've found a couple, just wondering if you've had success with one.

jonahkagan commented 11 years ago

I think I used one called moment On Dec 21, 2012 1:17 PM, "Sumner Warren" notifications@github.com wrote:

Do you have a suggestion for a date library in node? I've found a couple, just wondering if you've had success with one.

— Reply to this email directly or view it on GitHubhttps://github.com/hackersatbrown/api-morning-mail/issues/7#issuecomment-11627264.

sumnerwarren commented 11 years ago

Alright, I'll try that. Seems like it does everything we need, so it should be fine.

sumnerwarren commented 11 years ago

So I have all of the GET /v1/posts tests passing. I was going to do GET /v1/posts/:id next. node.io seems to be a popular scraping package. Any objections/suggestions?

jonahkagan commented 11 years ago

Sounds fine to me. I don't know much about scraping. One suggestion would be to download the pages so you can test the scraping locally. On Dec 23, 2012 12:58 PM, "Sumner Warren" notifications@github.com wrote:

So I have all of the GET /v1/posts tests passing. I was going to do GET /v1/posts/:id next. node.io seems to be a popular scraping package. Any objections/suggestions?

— Reply to this email directly or view it on GitHubhttps://github.com/hackersatbrown/api-morning-mail/issues/7#issuecomment-11650512.

sumnerwarren commented 11 years ago

Problem with our scraping approach to single posts: For some reason, MorningMail archive pages require a shibboleth login. I'm sure there is a way to login while scraping, but do we want to do that? That means someone's account will have to be used every time a single post is requested. Or we could maybe get an account for our group? What do you think about this? Maybe caching is the way to go.

For example, you have to login to access post 43743

jonahkagan commented 11 years ago

Oh man that's ridiculous. So annoying. I think you should spend <= 1 hour trying to get the scraper to log in with your account (or mine) as a temporary solution. If this works, email CIS about getting an account for the project. If it doesn't work, then we have three options:

  1. Caching (this seems dumb since they already have them archived)
  2. Only letting posts be retrieved by id in a certain date range (say, the last week) and then doing a linear-time search
  3. Scrapping this part of the API for now and pushing it back to v2

I vote for 2, since I can't really think of many use cases for getting a post by id except to do a detail view for one post within a larger Morning Mail reader, in which case having a date limit shouldn't be too much of an issue (since most people probably don't want to read really old posts).

Jonah

On Sun, Dec 23, 2012 at 7:27 PM, Sumner Warren notifications@github.comwrote:

Problem with our scraping approach to single posts: For some reason, MorningMail archive pages require a shibboleth login. I'm sure there is a way to login while scraping, but do we want to do that? That means someone's account will have to be used every time a single post is requested. Or we could maybe get an account for our group? What do you think about this? Maybe caching is the way to go.

For example, you have to login to access post 43743http://morningmail.brown.edu/archive?id=43743

— Reply to this email directly or view it on GitHubhttps://github.com/hackersatbrown/api-morning-mail/issues/7#issuecomment-11653818.

jonahkagan commented 11 years ago

How's this coming?

sumnerwarren commented 11 years ago

It's not really. I've been trying to get the scraping set up, but I haven't had any success yet. The problem with 2 is that we don't know what feed a post was sent to. So even if it was in the last week, we need to check all of the feeds until we find it. That seems time consuming and unnecessary.

Presumably if someone asks for a specific post it would be because they found it on a full list which would most likely be for a specific feed. So we could have the developer specify which feed the post is coming from, but that seems wrong. They should just be able to ask for an id and get it, I think. I really can't believe these are behind shibboleth.

jonahkagan commented 11 years ago

Can we just check the "all" feed?

sumnerwarren commented 11 years ago

Oh, I was interpreting that as events which were sent to all feeds. But you're right, it's simply all of the events. Alright, I'll do that.