Closed aciborowska closed 4 years ago
cost:nan
is caused by pq_a_cost:nan
since cost = pq_a_cost + pqa_cost
. If we solve pq_a_cost:nan
then we also solve cost:nan
pq_a_cost
is a loss for the answer model. NaN is caused by nans occurring in pq_a_squared_errors
matrix. It seems that it's cased by empty answers and/or questions.
Will confirm tomorrow.
Empty answers are filtered out, however the problem remains. I also noticed that sometime posts are empty. We need to remove them. That should solve the problem.
Empty posts were coming from Lucene. I am not sure why, maybe because of too long bug reports? Anyway, I updated future/src/data_generation/data_generator.py
to filter empty posts.
Ranking method produces
cost:nan, pq_a_cost:nan
while training (see log on google drive). Investigate why is it happening.