Closed jonahkagan closed 11 years ago
Do you have a suggestion for a date library in node? I've found a couple, just wondering if you've had success with one.
I think I used one called moment On Dec 21, 2012 1:17 PM, "Sumner Warren" notifications@github.com wrote:
Do you have a suggestion for a date library in node? I've found a couple, just wondering if you've had success with one.
— Reply to this email directly or view it on GitHubhttps://github.com/hackersatbrown/api-morning-mail/issues/7#issuecomment-11627264.
Alright, I'll try that. Seems like it does everything we need, so it should be fine.
So I have all of the GET /v1/posts tests passing. I was going to do GET /v1/posts/:id next. node.io seems to be a popular scraping package. Any objections/suggestions?
Sounds fine to me. I don't know much about scraping. One suggestion would be to download the pages so you can test the scraping locally. On Dec 23, 2012 12:58 PM, "Sumner Warren" notifications@github.com wrote:
So I have all of the GET /v1/posts tests passing. I was going to do GET /v1/posts/:id next. node.io seems to be a popular scraping package. Any objections/suggestions?
— Reply to this email directly or view it on GitHubhttps://github.com/hackersatbrown/api-morning-mail/issues/7#issuecomment-11650512.
Problem with our scraping approach to single posts: For some reason, MorningMail archive pages require a shibboleth login. I'm sure there is a way to login while scraping, but do we want to do that? That means someone's account will have to be used every time a single post is requested. Or we could maybe get an account for our group? What do you think about this? Maybe caching is the way to go.
For example, you have to login to access post 43743
Oh man that's ridiculous. So annoying. I think you should spend <= 1 hour trying to get the scraper to log in with your account (or mine) as a temporary solution. If this works, email CIS about getting an account for the project. If it doesn't work, then we have three options:
I vote for 2, since I can't really think of many use cases for getting a post by id except to do a detail view for one post within a larger Morning Mail reader, in which case having a date limit shouldn't be too much of an issue (since most people probably don't want to read really old posts).
Jonah
On Sun, Dec 23, 2012 at 7:27 PM, Sumner Warren notifications@github.comwrote:
Problem with our scraping approach to single posts: For some reason, MorningMail archive pages require a shibboleth login. I'm sure there is a way to login while scraping, but do we want to do that? That means someone's account will have to be used every time a single post is requested. Or we could maybe get an account for our group? What do you think about this? Maybe caching is the way to go.
For example, you have to login to access post 43743http://morningmail.brown.edu/archive?id=43743
— Reply to this email directly or view it on GitHubhttps://github.com/hackersatbrown/api-morning-mail/issues/7#issuecomment-11653818.
How's this coming?
It's not really. I've been trying to get the scraping set up, but I haven't had any success yet. The problem with 2 is that we don't know what feed a post was sent to. So even if it was in the last week, we need to check all of the feeds until we find it. That seems time consuming and unnecessary.
Presumably if someone asks for a specific post it would be because they found it on a full list which would most likely be for a specific feed. So we could have the developer specify which feed the post is coming from, but that seems wrong. They should just be able to ask for an id and get it, I think. I really can't believe these are behind shibboleth.
Can we just check the "all" feed?
Oh, I was interpreting that as events which were sent to all feeds. But you're right, it's simply all of the events. Alright, I'll do that.
Note this convo: https://github.com/hackersatbrown/api-morning-mail/pull/5#issuecomment-11409269