maiera / gde-app

Apache License 2.0
22 stars 9 forks source link

Review gplus update tasks #140

Closed Scarygami closed 9 years ago

Scarygami commented 9 years ago

Current situation

Currently the automatic AR-creation/update works like this:

  1. GET URL from Google+ post (either attachment or the post URL)
  2. Find an AR with this URL, or create one if it doesn't exist.
  3. Update the impact on this AR.

While reviewing this process in regards to enabling logical deletion of AR I discovered that this approach has some problems.

Scenario 1: GDE creates Google+ post with a URL. GDE later manually creates an AR with the same URL. Update task finds two AR, which currently causes the script to fail with an exception...

Scenario 2: Recurring contributions to an open-source project. Since the URL is the same all those contributions would be counted against the original AR and the activity won't show up when querying activities in a certain time-frame.

Scenario 3: Two GDEs work on the same project. Both GDEs post about it but only one will get the credit since the script merges both posts with the same URL into one AR.

Proposal

Since GDEs now have the possibility to merge activities themselves really easily, I would remove the automatic merging of AR by URL from the script. The only case where I would still merge two posts into one AR is if one is a reshare of the other.

For the rest I would leave it up to the GDE whether they want to merge activities or if they feel the new addition has enough impact to stand on its own.

patt0 commented 9 years ago

Wanna talk more about final solution as I am rewriting the cost to update and create activities.

I must admit that their are many avenue for improvement.

Best On 22 Oct 2014 21:25, "Gerwin Sturm" notifications@github.com wrote:

Current situation

Currently the automatic AR-creation/update works like this:

  1. GET URL from Google+ post (either attachment or the post URL)
  2. Find an AR with this URL, or create one if it doesn't exist.
  3. Update the impact on this AR.

While reviewing this process in regards to enabling logical deletion of AR I discovered that this approach has some problems.

Scenario 1: GDE creates Google+ post with a URL. GDE later manually creates an AR with the same URL. Update task finds two tasks which currently causes the script to fail with an exception...

Scenario 2: Recurring contributions to an open-source project. Since the URL is the same all those contributions would be counted against the original AR and the activity won't show up when querying activities in a certain time-frame.

Scenario 3: Two GDEs work on the same project. Both GDEs post about it but only one will get the credit since the script merges both posts with the same URL into one AR. Proposal

Since GDEs now have the possibility to merge activities themselves really easily, I would remove the automatic merging of AR by URL from the script. The only case where I would still merge two posts into one AR is if one is a reshare of the other.

For the rest I would leave it up to the GDE whether they want to merge activities or if they feel the new addition has enough impact to stand on its own.

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/140.

patt0 commented 9 years ago

Gerwin,

I am particularly lazy when it comes to updating status reports and the fact that it merges my posts when I am actually talking about the same activity is a big plus for me. Would the following make sense?

Scenario 1: We can modify the script so that if a link is referenced in multiple AR's is does not bomb (based on my research this only has only happened for Bruce) What do we want to do in this case? Update both AR with the values? Inform the GDE that he should further merge?

Scenario 2: Could we think about moving the contribution forwards? so that every time you contribute to an OSS project the effective date of the post moves to the last update / post about it? I think this one is the most troubling to me

Scenario 3: This one does not really apply as far as I can see, because the posts and records are based on the feed of individual GDE so and are not merged globally.

I wonder if we would be better off looking at updating from an AR point of view and drilling down in the associated posts rather than the other way round.

I should be posting a PR today with the new tasks based code to get and update AP.

Patrick Martinent

On Wed, Oct 22, 2014 at 9:34 PM, Patrick Martinent < patrick.martinent@gmail.com> wrote:

Wanna talk more about final solution as I am rewriting the cost to update and create activities.

I must admit that their are many avenue for improvement.

Best On 22 Oct 2014 21:25, "Gerwin Sturm" notifications@github.com wrote:

Current situation

Currently the automatic AR-creation/update works like this:

  1. GET URL from Google+ post (either attachment or the post URL)
  2. Find an AR with this URL, or create one if it doesn't exist.
  3. Update the impact on this AR.

While reviewing this process in regards to enabling logical deletion of AR I discovered that this approach has some problems.

Scenario 1: GDE creates Google+ post with a URL. GDE later manually creates an AR with the same URL. Update task finds two tasks which currently causes the script to fail with an exception...

Scenario 2: Recurring contributions to an open-source project. Since the URL is the same all those contributions would be counted against the original AR and the activity won't show up when querying activities in a certain time-frame.

Scenario 3: Two GDEs work on the same project. Both GDEs post about it but only one will get the credit since the script merges both posts with the same URL into one AR. Proposal

Since GDEs now have the possibility to merge activities themselves really easily, I would remove the automatic merging of AR by URL from the script. The only case where I would still merge two posts into one AR is if one is a reshare of the other.

For the rest I would leave it up to the GDE whether they want to merge activities or if they feel the new addition has enough impact to stand on its own.

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/140.

patt0 commented 9 years ago

Based on my tests, I think you are right, we should stop the automatic merging. We could introduce a advanced #merge tag to tell the machine to proceed with a merge?

Patrick Martinent

On Sat, Oct 25, 2014 at 9:12 AM, Patrick Martinent < patrick.martinent@gmail.com> wrote:

Gerwin,

I am particularly lazy when it comes to updating status reports and the fact that it merges my posts when I am actually talking about the same activity is a big plus for me. Would the following make sense?

Scenario 1: We can modify the script so that if a link is referenced in multiple AR's is does not bomb (based on my research this only has only happened for Bruce) What do we want to do in this case? Update both AR with the values? Inform the GDE that he should further merge?

Scenario 2: Could we think about moving the contribution forwards? so that every time you contribute to an OSS project the effective date of the post moves to the last update / post about it? I think this one is the most troubling to me

Scenario 3: This one does not really apply as far as I can see, because the posts and records are based on the feed of individual GDE so and are not merged globally.

I wonder if we would be better off looking at updating from an AR point of view and drilling down in the associated posts rather than the other way round.

I should be posting a PR today with the new tasks based code to get and update AP.

Patrick Martinent

On Wed, Oct 22, 2014 at 9:34 PM, Patrick Martinent < patrick.martinent@gmail.com> wrote:

Wanna talk more about final solution as I am rewriting the cost to update and create activities.

I must admit that their are many avenue for improvement.

Best On 22 Oct 2014 21:25, "Gerwin Sturm" notifications@github.com wrote:

Current situation

Currently the automatic AR-creation/update works like this:

  1. GET URL from Google+ post (either attachment or the post URL)
  2. Find an AR with this URL, or create one if it doesn't exist.
  3. Update the impact on this AR.

While reviewing this process in regards to enabling logical deletion of AR I discovered that this approach has some problems.

Scenario 1: GDE creates Google+ post with a URL. GDE later manually creates an AR with the same URL. Update task finds two tasks which currently causes the script to fail with an exception...

Scenario 2: Recurring contributions to an open-source project. Since the URL is the same all those contributions would be counted against the original AR and the activity won't show up when querying activities in a certain time-frame.

Scenario 3: Two GDEs work on the same project. Both GDEs post about it but only one will get the credit since the script merges both posts with the same URL into one AR. Proposal

Since GDEs now have the possibility to merge activities themselves really easily, I would remove the automatic merging of AR by URL from the script. The only case where I would still merge two posts into one AR is if one is a reshare of the other.

For the rest I would leave it up to the GDE whether they want to merge activities or if they feel the new addition has enough impact to stand on its own.

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/140.

Scarygami commented 9 years ago

Scenario 1: I don't think updating both AR would be a good idea, because that would produce duplicate counts of +1/comments/reshares. We could pick the AR that already has the post attached to it. That would be my suggestion for finding an AR anyway: First look for one that already has the post (via the post id or the post id of the original post for a reshare). If there isn't one then maybe look via the "activity url", and then create a new one.

Scenario 2: Yes updating the activity date would be a good solution for most cases, but could also "destroy" manually entered date, e.g. for event dates that don't move.

Scenario 3: Actually the script currently only looks for an AR based on the URL disregarding the GDE. So if there is one with a matching URL it would attach any post, no matter if from the same GDE or not (which should also be fixed).

merge tag sounds like an interesting solution, but maybe to make it sound nicer in the post we could use something like #repost

I think we should discuss this further at the summit, and get some more opinions about the topic :)

patt0 commented 9 years ago

So I am going to push a PR latter today that basically does the following; I am opening for comment as I complete this, so I can include any suggestion / comments before I commit this.

  1. Expresses an architecture to gather new posts and update them using app engine tasks, which are scheduled using cron jobs. This same approach will be used to write data gathering plugins so we can collect impact metrics automatically. I intend to get one ready that pulls view data from blogger in time for the presentation at the GDE summit so we may invite many GDE's to write data gathering plugins (I use the word plugin loosely).
  2. /tasks/new_gplpus is run daily and as the previous cron job gets new activities, this time for one gde at the time.

The implementation is different in that we only merge AP is they are actual shares from another post, if they are not we will create a new AR and leave it to the GDE to decide how and why he merges AR's and their associated AP's.

In order to do this, I process the new activities from oldest to newest.

We may revise this as we talk about it at the summit

  1. /tasks/upd_gplus is run weekly and updates existing activities, similarly as above for one gde at the time.

The implementation is pretty much the same as previously.

In particular we continue to ignore the fact that a #gde tag is removed from a gplus post, we consider currently that it has to be deleted in the front end, to be discussed at the summit.

  1. ActivityRecord -> find_or_create as a function, is really the responsibility of the Business Logic (ie. the task plugin architecture) so I am moving it out of ActivityPost Endpoints class and attaching it to update_gplus Module in the name find_or_create_ar.
  2. Scenario 3 issue rising for the same url being referenced by two GDE's disappears with the new implementation of 2 above.

In addition I discuss the MERITS of AP insert endpoint? Do we use it from the front end? is it useful in any way? Suggest we make it private as the insertion of AP is AUTOMATIC. Do we need an update endpoint? no bcoz we have automatic updates that would overwrite them.

I hope i am clear enough, let me know if I ain't.

Best

Patrick Martinent

On Sat, Oct 25, 2014 at 4:25 PM, Gerwin Sturm notifications@github.com wrote:

Scenario 1: I don't think updating both AR would be a good idea, because that would produce duplicate counts of +1/comments/reshares. We could pick the AR that already has the post attached to it. That would be my suggestion for finding an AR anyway: First look for one that already has the post (via the post id or the post id of the original post for a reshare). If there isn't one then maybe look via the "activity url", and then create a new one.

Scenario 2: Yes updating the activity date would be a good solution for most cases, but could also "destroy" manually entered date, e.g. for event dates that don't move.

Scenario 3: Actually the script currently only looks for an AR based on the URL disregarding the GDE. So if there is one with a matching URL it would attach any post, no matter if from the same GDE or not (which should also be fixed).

merge tag sounds like an interesting solution, but maybe to make it sound

nicer in the post we could use something like #repost

I think we should discuss this further at the summit, and get some more opinions about the topic :)

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/140#issuecomment-60478925.

SmokyBob commented 9 years ago

+1 for the idea of having a structure so that we can write "plugins" that fetch data from other sources.

As for the AP endpoints:

Best,

On Sun Oct 26 2014 at 7:53:12 AM Patrick Martinent notifications@github.com wrote:

So I am going to push a PR latter today that basically does the following; I am opening for comment as I complete this, so I can include any suggestion / comments before I commit this.

  1. Expresses an architecture to gather new posts and update them using app engine tasks, which are scheduled using cron jobs. This same approach will be used to write data gathering plugins so we can collect impact metrics automatically. I intend to get one ready that pulls view data from blogger in time for the presentation at the GDE summit so we may invite many GDE's to write data gathering plugins (I use the word plugin loosely).
  2. /tasks/new_gplpus is run daily and as the previous cron job gets new activities, this time for one gde at the time.

The implementation is different in that we only merge AP is they are actual shares from another post, if they are not we will create a new AR and leave it to the GDE to decide how and why he merges AR's and their associated AP's.

In order to do this, I process the new activities from oldest to newest.

We may revise this as we talk about it at the summit

  1. /tasks/upd_gplus is run weekly and updates existing activities, similarly as above for one gde at the time.

The implementation is pretty much the same as previously.

In particular we continue to ignore the fact that a #gde tag is removed from a gplus post, we consider currently that it has to be deleted in the front end, to be discussed at the summit.

  1. ActivityRecord -> find_or_create as a function, is really the responsibility of the Business Logic (ie. the task plugin architecture) so I am moving it out of ActivityPost Endpoints class and attaching it to update_gplus Module in the name find_or_create_ar.
  2. Scenario 3 issue rising for the same url being referenced by two GDE's disappears with the new implementation of 2 above.

In addition I discuss the MERITS of AP insert endpoint? Do we use it from the front end? is it useful in any way? Suggest we make it private as the insertion of AP is AUTOMATIC. Do we need an update endpoint? no bcoz we have automatic updates that would overwrite them.

I hope i am clear enough, let me know if I ain't.

Best

Patrick Martinent

On Sat, Oct 25, 2014 at 4:25 PM, Gerwin Sturm notifications@github.com wrote:

Scenario 1: I don't think updating both AR would be a good idea, because that would produce duplicate counts of +1/comments/reshares. We could pick the AR that already has the post attached to it. That would be my suggestion for finding an AR anyway: First look for one that already has the post (via the post id or the post id of the original post for a reshare). If there isn't one then maybe look via the "activity url", and then create a new one.

Scenario 2: Yes updating the activity date would be a good solution for most cases, but could also "destroy" manually entered date, e.g. for event dates that don't move.

Scenario 3: Actually the script currently only looks for an AR based on the URL disregarding the GDE. So if there is one with a matching URL it would attach any post, no matter if from the same GDE or not (which should also be fixed).

merge tag sounds like an interesting solution, but maybe to make it

sound nicer in the post we could use something like #repost

I think we should discuss this further at the summit, and get some more opinions about the topic :)

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/140#issuecomment-60478925.

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/140#issuecomment-60508414.

brucemcpherson commented 9 years ago

Yes +1 indeed. I have something that already collects +1 and page view views impact for google sites filtered by topics, and am just about to do the the same for blogger and then slide share all of which I stored in a database. I have to transfer all that manually to the gde app but it would be easy via an api to update that daily. So a simple api with basic crud facilities for activity records would be great.

Here's the write up on the sites collector... http://ramblings.mcpher.com/Home/excelquirks/analyticsandsites

Sent from my iPad

On 26 Oct 2014, at 09:34, Mauro Solcia notifications@github.com wrote:

+1 for the idea of having a structure so that we can write "plugins" that fetch data from other sources.

As for the AP endpoints:

  • Frontend don't use them, only works on AR
  • I can't think of a use of the AP write endpoints from outside GAE, even historical data from the previous form would be added as AR.

Best,

On Sun Oct 26 2014 at 7:53:12 AM Patrick Martinent notifications@github.com wrote:

So I am going to push a PR latter today that basically does the following; I am opening for comment as I complete this, so I can include any suggestion / comments before I commit this.

  1. Expresses an architecture to gather new posts and update them using app engine tasks, which are scheduled using cron jobs. This same approach will be used to write data gathering plugins so we can collect impact metrics automatically. I intend to get one ready that pulls view data from blogger in time for the presentation at the GDE summit so we may invite many GDE's to write data gathering plugins (I use the word plugin loosely).
  2. /tasks/new_gplpus is run daily and as the previous cron job gets new activities, this time for one gde at the time.

The implementation is different in that we only merge AP is they are actual shares from another post, if they are not we will create a new AR and leave it to the GDE to decide how and why he merges AR's and their associated AP's.

In order to do this, I process the new activities from oldest to newest.

We may revise this as we talk about it at the summit

  1. /tasks/upd_gplus is run weekly and updates existing activities, similarly as above for one gde at the time.

The implementation is pretty much the same as previously.

In particular we continue to ignore the fact that a #gde tag is removed from a gplus post, we consider currently that it has to be deleted in the front end, to be discussed at the summit.

  1. ActivityRecord -> find_or_create as a function, is really the responsibility of the Business Logic (ie. the task plugin architecture) so I am moving it out of ActivityPost Endpoints class and attaching it to update_gplus Module in the name find_or_create_ar.
  2. Scenario 3 issue rising for the same url being referenced by two GDE's disappears with the new implementation of 2 above.

In addition I discuss the MERITS of AP insert endpoint? Do we use it from the front end? is it useful in any way? Suggest we make it private as the insertion of AP is AUTOMATIC. Do we need an update endpoint? no bcoz we have automatic updates that would overwrite them.

I hope i am clear enough, let me know if I ain't.

Best

Patrick Martinent

On Sat, Oct 25, 2014 at 4:25 PM, Gerwin Sturm notifications@github.com wrote:

Scenario 1: I don't think updating both AR would be a good idea, because that would produce duplicate counts of +1/comments/reshares. We could pick the AR that already has the post attached to it. That would be my suggestion for finding an AR anyway: First look for one that already has the post (via the post id or the post id of the original post for a reshare). If there isn't one then maybe look via the "activity url", and then create a new one.

Scenario 2: Yes updating the activity date would be a good solution for most cases, but could also "destroy" manually entered date, e.g. for event dates that don't move.

Scenario 3: Actually the script currently only looks for an AR based on the URL disregarding the GDE. So if there is one with a matching URL it would attach any post, no matter if from the same GDE or not (which should also be fixed).

merge tag sounds like an interesting solution, but maybe to make it

sound nicer in the post we could use something like #repost

I think we should discuss this further at the summit, and get some more opinions about the topic :)

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/140#issuecomment-60478925.

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/140#issuecomment-60508414.

— Reply to this email directly or view it on GitHub.

Scarygami commented 9 years ago

@patt0 sounds great to me, and I agree that we don't need writing AP-endpoints in the API.

@brucemcpherson the API is there already and can be used for such cases. We will have a talk/session at the GDE Summit to talk about this in detail.

Scarygami commented 9 years ago

Closed via https://github.com/maiera/gde-app/pull/147 Thanks @patt0 for the great work! :)