spree / spree_wombat

Connect your Spree Commerce storefront to Wombat
BSD 3-Clause "New" or "Revised" License
31 stars 53 forks source link

New models never get sent to wombat #57

Closed jordan-brough closed 9 years ago

jordan-brough commented 9 years ago

See this comment.

I think we need to remove the unless condition on this line. I believe this introduces a bug that causes new models to never get sync'd. I noticed this happening in our staging environment.

If you have a model that's never been pushed to wombat then when you enter #push_batches the last_pushed_timestamp will be nil. That means last_push_time will be set to Time.now and it's unlikely that any objects will be found. So object_count will be 0 and when you hit the aforementioned line it will be skipped and the last_pushed_timestamp will remain nil and the same process will repeat the next time this is called.

cc @athal7 @philbirt

JDutil commented 9 years ago

Should be resolved with b3b53307ef1ef8c2f2b367c19960f769a4282437

jordan-brough commented 9 years ago

@JDutil it seems strange that we only record the last_pushed_timestamp if we find something. That means the next time we run we ignore the fact that we already searched the previous timespan and found nothing. Why skip recording that bit of information and require the next iteration to query all the same records again?

jordan-brough commented 9 years ago

Btw it seems like it would be nice to see some specs for this stuff. Is that on the to-do list?

JDutil commented 9 years ago

There has been instances in the past where the last push timestamp and updated_at timestamp were the same and caused missed records so it's safer not to record the timestamp. It also gives you a better idea when debugging when the last time a record was actually pushed.

jordan-brough commented 9 years ago

There has been instances in the past where the last push timestamp and updated_at timestamp were the same and caused missed records so it's safer not to record the timestamp

Yikes. Are you guys working on a more direct solution for that problem? This is really indirect way of fixing a small subset of that problem. This only fixes the case where nothing was pushed in the previous push and there was some sort of clock skew problem. Yet the problem happens whether or not anything was pushed in the previous run and would be more likely to happen for stores that are pushing a lot of data all the time.

It also gives you a better idea when debugging when the last time a record was actually pushed.

True but it erases the data point of when the last actual run happened, which seems like a more important data point to me. I think we're merging two data points into one (last_pushed_at vs last_pushed_with_data_at) and are paying the price with stuff like this bug. I would expect each push run to query over its own time period. Having it do something else seems unexpected and error prone to me.

JDutil commented 9 years ago

Well we've changed it to use an offset that seems to have fixed occurrences of this: https://github.com/spree/spree_wombat/commit/1da6d6f066ac03e9a371716a5667764df2ccf975

I still think setting the time back to start by default is the way to go, but I may just be being overly cautious about updating the timestamp. I'm not all that opposed to changing it, but seemed easier to debug to me. I see your point on wanting to know the last time it attempted to run though.

/cc @peterberkenbosch @huoxito what do you think?

huoxito commented 9 years ago

Not so familiar with the set up here but I can see why one would like to know when was the last time the task was actually triggered, e.g. we use statsd to log and graph those events on our stores

+1 for the offset as well