praw-dev / praw

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.
http://praw.readthedocs.io/
BSD 2-Clause "Simplified" License
3.39k stars 453 forks source link

Name and parameter consistency #162

Closed bboe closed 11 years ago

bboe commented 11 years ago
bboe commented 11 years ago

@Damgaard I just pushed the main fixes I wanted to add here. I want to get your feedback before I close this. Are there any items that you think should be renamed, or adjusted as part of this release? If not, then I'll close this and we'll pretty much be ready for the 2.0 release.

bboe commented 11 years ago

I should write that at some point I plan to combine helper._request and the Memoize decorator since those two ideas rely on each other. However, that's a completely internal change that can be done transparently. Thus I have no intention to include it in 2.0.

Damgaard commented 11 years ago

Actually yes. I would like to break how comments work. It doesn't make much sense that after a all_comments it and comments return the same thing. Plus you can only do it once. The only way to get comments after all_comments have been executed is to do subreddit.refresh() and then all_comments. But that extracts a lot of already extracted comments, i.e. wastes api calls.

I would like to change it so comments and comments_flat return the current known comments. A new method get_new_comments() returns previously unknown comments. So the equiavalent of all_comments would be first calling get_new_comments possibly wihtout saving the results and then comments.

bboe commented 11 years ago

Can you explain more in detail? The comments attribute is just that an attribute that is part of the submission. I could see the point of a get new comments though. On Jan 14, 2013 12:54 PM, "Andreas Damgaard Pedersen" < notifications@github.com> wrote:

Actually yes. I would like to break how comments work. It doesn't make much sense that after a all_comments it and comments return the same thing. Plus you can only do it once. The only way to get comments after all_comments have been executed is to do subreddit.refresh() and then all_comments. But that extracts a lot of already extracted comments, i.e. wastes api calls.

I would like to change it so comments and comments_flat return the current known comments. A new method get_new_comments() returns previously unknown comments. So the equiavalent of all_comments would be first calling get_new_comments possibly wihtout saving the results and then comments.

— Reply to this email directly or view it on GitHubhttps://github.com/praw-dev/praw/issues/162#issuecomment-12239136.

bboe commented 11 years ago

Okay, now that I'm back from lunch I can go into a bit more detail. I agree there probably needs to be some work here. It makes sense that the comments property return whatever comments were returned with the submission. The question is, once we know about additional comments, why not update the built-in comment tree? This is precisely why the comments attribute is a dynamic property.

Then the question is, what is the interface for converting the more comments objects? I think this is where the optimization can come into play. Rather than have the all_ methods, perhaps the interface can be to explicitly call a new public-facing function replace_more_comments and here we can have an argument which is the number of additional requests to make, and/or a threshold for how many comments there must be in order to make the request (that value is available now). After calling replace_more_comments then the comments property will return the updated tree. The question is what do we do with the existing "more" objects? Do we ignore them, or leave them in the tree? Leaving them in the tree is not a good solution if the user expects only Comment objects at this point.

Regarding getting new comments: there isn't an API command for that. Doing so would require fetching the object again and comparing with what was there previously. While it's do-able this sort of thing should be left to anyone that needs that functionality, not provided as part of PRAW. Anyone that wants to update existing objects should know about the refresh function and should have to explicitly call it (as is done with all the other refreshable objects).

The _flat attributes I am now in favor of removing for the exact same reason I don't want to include a get_new_comments function. This isn't functionality provided via a reddit_api call. When I originally added it I thought it was useful for others to have, which it very well may be, but continuously adding features we think might be useful (or are) will result in a lot of non-necessary code.

So I'm for refactoring how the "more" comment objects get replaced, but aside from that I'm not too keen on additional changes here. Thoughts?

Damgaard commented 11 years ago

Regarding getting new comments: there isn't an API command for that.

True. I was thinking of requesting the comments sorted by 'new' format and add them until a comment with a new id is met. The point over using this compared to refreshing first is that it would mean a lot fewer api calls for threads with a lot of comments.

Speaking of that. Since we've increased the content call to request 100 things every time, then we should probably do that same for comments and just get the most possible every time.

The question is, once we know about additional comments, why not update the built-in comment tree?

The problem is inconsistency and obvious behaviour. The built-in comment tree is updated automatically, up to the comment limit for the single call. This is quite unobvious behaviour and can cause mysterious bugs. Say we wanted to study the relative mentioning of cats to dogs in some r/aww submissions.

mentions_of_cat = sum('cat' in comment.body for comment in submission.comments) # submission.comments is pulled from cache
# submission.cache are timed out
mentions_of_dogs = sum('cat' in comment.body for comment in submission.comments) # submission.comments are dynamically updated

In this case the result would be wrong although nothing would look wrong in the code.

It's unobvious because it requires that the user knows that comments and comments_flat are the only attributes that automatically update. For everything else the update has to be forced with refresh. Other things change, but they are actual methods not (property decorated) attributes. It means the behavior of client programs behave differently depending on whether the logged-in users has gold. Because the max comments with a single call is 1500. The behavior will also be confusing for those who do not know it dynamically updates to this limit. This is why I prefer an explicit call to refresh the comments.

I prefer keeping _flat for now. I know it's stricly not part of the API, but it seems very useful for beginners and those who simply do not care about the tree structure. I wrote a script to find and parse Q&A's for r/iama once, became much simpler by not having to think / implement tree traversel.

Damgaard commented 11 years ago

clear_all_flair should also be removed. link It is not part of the API and only deletes 25 user flair. This seems like something users should implement rather than be part of PRAW.

bboe commented 11 years ago

I think you have a slight misunderstanding about how the comments work. First, there is only a single request to fetch all the comments of a submission. Actually there is no request because the comments come for-free as part of the submission. Of course, this may result in 'more' comment objects, but each of those are all retrieved with the single request. Subsequent calls to comments will always return the same data and never make additional requests. Only when a user wants to replace more objects with the comments the more object represents are additional requests made and new items added (happens automatically with the all_ functions.

Speaking of that. Since we've increased the content call to request 100 things every time, then we should probably do that same for comments and just get the most possible every time.

I think that already occurs when the requested limit is > the maximum for the user class. But you're right that we can do away with the hard-set limits and just try a big number when limit=None. Again, if limits aren't set we should use whatever the reddit default is.

In your dog/cat example the fact that the submission expired is of no consequence. Both requests for the comments attribute will return the same data unless an explicit submission.refresh call is made.

I agree with keeping _flat though maybe we should provide a helper that performs the conversion rather than comments_flat?

clear_all_flair should also be removed.

Touché. If we add a praw-extras package then I'm for moving it to there. Until then, it just needs to be fixed to use limit=None (good catch). That or reddit should expose this functionality.

Damgaard commented 11 years ago

I think you have a slight misunderstanding about how the comments work.

Indeed. I should probably have verified that comments worked precisely like I thought they did.

But you're right that we can do away with the hard-set limits and just try a big number when limit=None. Again, if limits aren't set we should use whatever the reddit default is.

I meant that instead of using the reddit default when no limit is given, we could take the user group maximum.

I agree with keeping _flat though maybe we should provide a helper that performs the conversion rather than comments_flat?

That sounds like a good solution.

Touché. If we add a praw-extras package then I'm for moving it to there. Until then, it just needs to be fixed to use limit=None (good catch). That or reddit should expose this functionality.

If there are more than 1000 users with flair, then it won't actually clear all flair (which is what it says it does). To actually do that, we would have to first evict the cache of user flair, then requery it to see how many people now have flair. If it's zero, return the results else do it all again. The name also implies it clear's link flair.

bboe commented 11 years ago

I meant that instead of using the reddit default when no limit is given, we could take the user group maximum.

I'm going to stick with using defaults unless the client explicitly asks for something else. I don't think there is anywhere that we attempt to get the most we can as the result of a non-explicit action.

Okay, so I'm going to add a flat_list function to praw.helpers. Thus s.comments_flat will need to be replaced by flat_list(s.comments).

If there are more than 1000 users with flair, then it won't actually clear all flair

That's just a bug :). But that's a good argument in favor of having an actual reddit api clear all flair command.

Damgaard commented 11 years ago

I think flatten_comments is a more descriptive name, but other than that it sounds good.

bboe commented 11 years ago

Actually it's going to be flatten_tree since it could technically apply to other trees not just comment trees.

bboe commented 11 years ago

Okay resolved the _flat issue. I'll see what to do about the all_comments properties now.

bboe commented 11 years ago

And resolved the `all_comments issue. @Damgaard Please look over these changes. I consider 2.0 complete, but if there is anything you think needs to be changed please let me know as soon as you can.

Damgaard commented 11 years ago

I've looked it over and it looks fine. There is no breaking changes that I can think of at the moment, I'll have the update to changelog and wiki in general done in under an hour.

Damgaard commented 11 years ago

Actually. I have a few things now.

get_mod_log returns something like {u'kind': u'modaction', u'data': {u'description': None, u'mod_id36': u'6j357', u'details': u'remove', u'action': u'removelink', u'sr_id36': u'2tbeo', u'target_fullname': u't3_16l33b'}}

We should get it to return only the data part.

I think comments in a submission should be changed into a method and possibly renamed to get_comments, that way we could call it with comment_sort and comment_limit

Thoughts?

bboe commented 11 years ago

Agree with modlog stuff. Please make that change. But comments should remain an attribute. All other get prefixed functions semi indicate a request is made. On Jan 14, 2013 8:13 PM, "Andreas Damgaard Pedersen" < notifications@github.com> wrote:

Actually. I have a few things now.

get_mod_log returns something like {u'kind': u'modaction', u'data': {u'description': None, u'mod_id36': u'6j357', u'details': u'remove', u'action': u'removelink', u'sr_id36': u'2tbeo', u'target_fullname': u't3_16l33b'}}

We should get it to return only the data part.

I think comments in a submission should be changed into a method and possibly renamed to get_comments, that way we could call it with comment_sort and comment_limit

Thoughts?

— Reply to this email directly or view it on GitHubhttps://github.com/praw-dev/praw/issues/162#issuecomment-12252784.

bboe commented 11 years ago

Well I'm going to tackle this real quick then.

bboe commented 11 years ago

Added ModAction object: a0e903baa7b4d029924b67f1de13ef235b82c83f. I am closing this issue, touching up the changelog and going to make the preliminary announcement on /r/redditdev.