jma127 / pyltr

Python learning to rank (LTR) toolkit
BSD 3-Clause "New" or "Revised" License
463 stars 107 forks source link

OverflowError: math range error #23

Closed shivaraj1994 closed 3 years ago

shivaraj1994 commented 3 years ago

I am trying train using MQ2007-list dataset.

with open('/home/shivaraj/Downloads/MQ2007-list/Fold1/train.txt') as trainfile, \ open('/home/shivaraj/Downloads/MQ2007-list/Fold1/vali.txt') as valifile, \ open('/home/shivaraj/Downloads/MQ2007-list/Fold1/test.txt') as evalfile: TX, Ty, Tqids, T_ = pyltr.data.letor.readdataset(trainfile) VX, Vy, Vqids, V = pyltr.data.letor.readdataset(valifile) EX, Ey, Eqids, E = pyltr.data.letor.read_dataset(evalfile)

metric = pyltr.metrics.NDCG(k=10)

Only needed if you want to perform validation (early stopping & trimming)

monitor = pyltr.models.monitors.ValidationMonitor( VX, Vy, Vqids, metric=metric, stop_after=250)

model = pyltr.models.LambdaMART( metric=metric, n_estimators=1000, learning_rate=0.02, max_features=0.5, query_subsample=0.5, max_leaf_nodes=10, min_samples_leaf=64, verbose=1, )

model.fit(TX, Ty, Tqids, monitor=monitor)

This error log--

OverflowError Traceback (most recent call last)

in 16 ) 17 ---> 18 model.fit(TX, Ty, Tqids, monitor=monitor) ~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in fit(self, X, y, qids, monitor) 199 200 n_stages = self._fit_stages(X, y, qids, y_pred, --> 201 random_state, begin_at_stage, monitor) 202 203 if n_stages < self.estimators_.shape[0]: ~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stages(self, X, y, qids, y_pred, random_state, begin_at_stage, monitor) 406 y_pred = self._fit_stage(i, X, y, qids, y_pred, sample_weight, 407 sample_mask, query_groups_to_use, --> 408 random_state) 409 410 train_total_score, oob_total_score = 0.0, 0.0 ~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stage(self, i, X, y, qids, y_pred, sample_weight, sample_mask, query_groups, random_state) 332 for qid, a, b, _ in query_groups: 333 lambdas, deltas = self._calc_lambdas_deltas(qid, y[a:b], --> 334 y_pred[a:b]) 335 all_lambdas[a:b] = lambdas 336 all_deltas[a:b] = deltas ~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _calc_lambdas_deltas(self, qid, y, y_pred) 267 actual = y[positions] 268 --> 269 swap_deltas = self.metric.calc_swap_deltas(qid, actual) 270 max_k = self.metric.max_k() 271 if max_k is None or ns < max_k: ~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/dcg.py in calc_swap_deltas(self, qid, targets, coeff) 33 for j in range(i + 1, n_targets): 34 deltas[i, j] = coeff * \ ---> 35 (self._gain_fn(targets[i]) - self._gain_fn(targets[j])) * \ 36 (self._get_discount(j) - self._get_discount(i)) 37 ~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/gains.py in _exp2_gain(x) 16 17 def _exp2_gain(x): ---> 18 return math.exp(x * _LOG2) - 1.0 19 20 OverflowError: math range error
tr8dr commented 3 years ago

Did you find a workaround for this? I am having the same problem as well ...

jma127 commented 3 years ago

Hello,

I don't have the full picture on what scores are included in the dataset; however, I'm guessing one of two possibilities:

tr8dr commented 3 years ago

One could restructure the problem to be in log space, avoiding the exponential. This is how many numerical algorithms deal with exponentials (and potential overflow).

jma127 commented 3 years ago

That's indeed what the above bit of code does :)