yandex-research / rtdl

Research on Tabular Deep Learning: Papers & Packages
Apache License 2.0
888 stars 98 forks source link

# bug, located in rtdl.data.piecewise_linear_encoding line #618 #54

Closed grayicon closed 1 year ago

grayicon commented 1 year ago

"is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges)))is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges)))" should be "is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges)))is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges))).unsqueeze(-1))", to use tensor broadcast func

Yura52 commented 1 year ago

Thank you for the report. Currently, the main branch contains significant amount of unfinished work not released to PyPI, so bugs are expected, and use it at your own risk.

P.S. The formatted version of the above bug report:

instead of this:

is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges)))

it should be this:

is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges))).unsqueeze(-1))
Yura52 commented 1 year ago

@grayicon is it actually a bug?

Yura52 commented 1 year ago

Feel free to reopen the issue if needed.

grayicon commented 1 year ago

@grayicon is it actually a bug?

  • bin_indices + 1 has the shape (n_objects, n_features)
  • as_tensor(list(map(len, bin_edges))) has the shape (n_features,)
  • According to broadcasting rules, the result is_last_bin has the shape (n_objects, n_features)

The above desc is right in theory, but in fact i found the real bug lie in the line #590 of rtdl.data.py "bin_edges = torch.as_tensor(bin_ratios)", which cause the state "as_tensor(list(map(len, bin_edges)))" has the shape (n_objects,)

Yura52 commented 1 year ago

So it seems to be the issue reported here, right?

https://github.com/Yura52/rtdl/issues/47

Yura52 commented 1 year ago

So it should be now resolved by https://github.com/Yura52/rtdl/commit/7929497c6f37f4cd738ffcd5584642cb1d016ae3

Please, let me know if the issue still there

grayicon commented 1 year ago

So it should be now resolved by 7929497

Please, let me know if the issue still there

The bin_egdes is a list of torch tensor, So it doesn't support the opera of 'torch.as_tensor'. Consequently, I just comment this line "bin_edges = torch.as_tensor(bin_ratios)", and it will works when call of "map(len, bin_edges) below".

Yura52 commented 1 year ago

@grayicon oh, I think now I understand. I removed this line in this commit. Does it look good now?

grayicon commented 1 year ago

@grayicon oh, I think now I understand. I removed this line in this commit. Does it look good now?

Yes,It can work now, Thanks for your effort.