imranraad07 / BugReportQA

0 stars 0 forks source link

nan in log #42

Closed aciborowska closed 4 years ago

aciborowska commented 4 years ago

Ranking method produces cost:nan, pq_a_cost:nan while training (see log on google drive). Investigate why is it happening.

aciborowska commented 4 years ago

cost:nan is caused by pq_a_cost:nan since cost = pq_a_cost + pqa_cost. If we solve pq_a_cost:nan then we also solve cost:nan

pq_a_cost is a loss for the answer model. NaN is caused by nans occurring in pq_a_squared_errors matrix. It seems that it's cased by empty answers and/or questions.

Will confirm tomorrow.

aciborowska commented 4 years ago

Empty answers are filtered out, however the problem remains. I also noticed that sometime posts are empty. We need to remove them. That should solve the problem.

aciborowska commented 4 years ago

Empty posts were coming from Lucene. I am not sure why, maybe because of too long bug reports? Anyway, I updated future/src/data_generation/data_generator.py to filter empty posts.