openedx-unsupported / edx-analytics-pipeline

GNU Affero General Public License v3.0
91 stars 116 forks source link

user location hotfix #787

Closed HassanJaveed84 closed 4 years ago

HassanJaveed84 commented 4 years ago

events with user_id(non-int): anonede71904b92643beb3fe caused job failures. Not sure if there is anything else we can do.

The jenkins job is currently running off this branch.

codecov[bot] commented 4 years ago

Codecov Report

Merging #787 into master will decrease coverage by 0.06%. The diff coverage is 40%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #787      +/-   ##
==========================================
- Coverage   74.59%   74.53%   -0.07%     
==========================================
  Files         211      211              
  Lines       24147    24090      -57     
==========================================
- Hits        18013    17955      -58     
- Misses       6134     6135       +1
Impacted Files Coverage Δ
...dx/analytics/tasks/insights/location_per_course.py 85.38% <40%> (-1.07%) :arrow_down:
...ytics/tasks/programs/tests/test_program_reports.py 100% <0%> (ø) :arrow_up:
edx/analytics/tasks/programs/program_reports.py 93.88% <0%> (+0.04%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 971b025...e12f754. Read the comment docs.

estute commented 4 years ago

how often do we see failure types like this? is this the first time this kind of error has occurred? If we always expect them to be ints, is it concerning that this one wasn't? Perhaps we don't want to mask this type of failure.

HassanJaveed84 commented 4 years ago

I don't remember seeing an error like this before, specifically in this workflow. We should get an int. Maybe we can log the number of non-int ids and skip them. I'm open to suggestions.

estute commented 4 years ago

I suppose I was mostly worried that if this behavior is new, there might be an upstream change that has caused this and would be concerned about masking a new behavior. Logging sounds like a good start.

doctoryes commented 4 years ago

I agree with @estute that we should have an idea of the frequency of this error before discarding. It might be caused an upstream change. Or some frequent change that discards lots of data. But - of course - this implies that we'll need to follow-up and check on the log frequency.

HassanJaveed84 commented 4 years ago

The job failed again, I've added a counter.