radanalyticsio / jiminy-predictor

a predictor service for a spark based recommendation app
Apache License 2.0
2 stars 4 forks source link

error in user rejection code #29

Closed elmiko closed 6 years ago

elmiko commented 6 years ago

testing out the new user id rejection code, i am running into a bug that looks quite core to the functioning.

i tested by attaching the new version of the predictor to my previously available models and an html-server, when checking to get rank predictions i am rejected on every call. this is the log from the predictor:

172.17.0.11 - - [19/Feb/2018 20:48:23] "POST /predictions/ranks HTTP/1.1" 201 -
18/02/19 20:48:23 INFO SparkContext: Starting job: count at /opt/app-root/src/model.py:42
18/02/19 20:48:23 INFO DAGScheduler: Got job 6 (count at /opt/app-root/src/model.py:42) with 2 output partitions
18/02/19 20:48:23 INFO DAGScheduler: Final stage: ResultStage 6 (count at /opt/app-root/src/model.py:42)
18/02/19 20:48:23 INFO DAGScheduler: Parents of final stage: List()
18/02/19 20:48:23 INFO DAGScheduler: Missing parents: List()
18/02/19 20:48:23 INFO DAGScheduler: Submitting ResultStage 6 (PythonRDD[18] at count at /opt/app-root/src/model.py:42), which has no missing parents
18/02/19 20:48:23 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 8.7 KB, free 366.3 MB)
18/02/19 20:48:23 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 4.8 KB, free 366.3 MB)
18/02/19 20:48:23 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 172.17.0.13:42337 (size: 4.8 KB, free: 366.3 MB)
18/02/19 20:48:23 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1006
18/02/19 20:48:23 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 6 (PythonRDD[18] at count at /opt/app-root/src/model.py:42) (first 15 tasks are for partitions Vector(0, 1))
18/02/19 20:48:23 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks
18/02/19 20:48:23 WARN TaskSetManager: Stage 6 contains a task of very large size (3481 KB). The maximum recommended task size is 100 KB.
18/02/19 20:48:23 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 8, 172.17.0.8, executor 0, partition 0, PROCESS_LOCAL, 3565266 bytes)
18/02/19 20:48:23 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 9, 172.17.0.8, executor 0, partition 1, PROCESS_LOCAL, 3540613 bytes)
18/02/19 20:48:23 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 172.17.0.8:39263 (size: 4.8 KB, free: 366.0 MB)
18/02/19 20:48:23 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 9) in 70 ms on 172.17.0.8 (executor 0) (1/2)
18/02/19 20:48:23 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 8) in 82 ms on 172.17.0.8 (executor 0) (2/2)
18/02/19 20:48:23 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 
18/02/19 20:48:23 INFO DAGScheduler: ResultStage 6 (count at /opt/app-root/src/model.py:42) finished in 0.083 s
18/02/19 20:48:23 INFO DAGScheduler: Job 6 finished: count at /opt/app-root/src/model.py:42, took 0.090408 s
2018-02-19 20:48:23,590 - jiminy-predictor - ERROR - Requesting rankings for invalid user id=1

there must be some issue with the user id population or something, i am certain that this user exists in the database.

elmiko commented 6 years ago

@ruivieira removed you from assignees, i think we have this under control. i have a solution and @sophwats is investigating for deeper understanding =)