microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.32k stars 274 forks source link

fixing XGBoostRegression issue with base_predicitions #764

Closed giriprasad51 closed 3 months ago

giriprasad51 commented 4 months ago

FAILED tests/test_xgboost_converter.py::TestXGBoostConverter::test_float64_xgb_ranker_converter - AssertionError: FAILED tests/test_xgboost_converter.py::TestXGBoostConverter::test_float64_xgb_regressor_converter - AssertionError: FAILED tests/test_xgboost_converter.py::TestXGBoostConverter::test_run_xgb_pandas - AssertionError:

Above test cases are passed

giriprasad51 commented 4 months ago

Hii @ksaur I made some changes in the code please go through it. Thank you!

ksaur commented 4 months ago

Hello! Was this failing somewhere? Is this related to an open issue? Can you please show me where the assertion errors you posted were coming from? (what version of xgb, etc). I'm only aware of the #732 but that was with xgb==2.0 thanks!

giriprasad51 commented 4 months ago

Hii @ksaur xgb = 2.0.3 By running test cases !pytest /content/hummingbird/tests

From this =============== 18 failed, 443 passed, 203 skipped, 65 warnings in 164.71s (0:02:44) ===============

To

=============== 15 failed, 446 passed, 203 skipped, 65 warnings in 167.01s (0:02:47) ===============

You can reffer this colab notebook : https://colab.research.google.com/drive/1hUqyl2LafAMnN01JGju5Huzv8JRQeQhI

giriprasad51 commented 4 months ago

https://github.com/microsoft/hummingbird/blob/3eb83937d3a3776b4b3eb82c1011243833a4bcd3/hummingbird/ml/operator_converters/xgb.py#L138

Even though xgboost regressor contains base score here it's getting None

ksaur commented 4 months ago

cc: @interesaaat to check it out (since he's worked with this section of code quite a bit)

codecov-commenter commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 89.95%. Comparing base (3eb8393) to head (18cf433).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #764 +/- ## ======================================= Coverage 89.94% 89.95% ======================================= Files 80 80 Lines 4696 4697 +1 Branches 864 864 ======================================= + Hits 4224 4225 +1 Misses 269 269 Partials 203 203 ``` | [Flag](https://app.codecov.io/gh/microsoft/hummingbird/pull/764/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/microsoft/hummingbird/pull/764/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | `89.95% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

giriprasad51 commented 4 months ago

Hii @interesaaat As per your suggestion i modify the code please go through it

interesaaat commented 4 months ago

Hii @interesaaat As per your suggestion i modify the code please go through it

That is not what I meant because you are still saving the model. I think there is a way to access the base_score directly without having to save the model.

mshr-h commented 4 months ago

I think the only way to access the model parameter is by using the save_config() method. I grepped the XGBoost source code and found this utility function just for testing. https://github.com/dmlc/xgboost/blob/v2.0.3/python-package/xgboost/testing/updater.py#L12

It's quite similar what @giriprasad51 is doing. We can use get_basescore() by from xgboost.testing.updater import get_basescore but it's only available from 2.0.0.

ksaur commented 3 months ago

I think the only way to access the model parameter is by using the save_config() method. I grepped the XGBoost source code and found this utility function just for testing. https://github.com/dmlc/xgboost/blob/v2.0.3/python-package/xgboost/testing/updater.py#L12

It's quite similar what @giriprasad51 is doing. We can use get_basescore() by from xgboost.testing.updater import get_basescore but it's only available from 2.0.0.

Thanks @mshr-h ! Hmm Ok, what do you think @interesaaat ? Should we try to do this just here, or wait and have a higher level solution that's more of a broad xgb==2.0 fix? I am open to either!

interesaaat commented 3 months ago

I think the only way to access the model parameter is by using the save_config() method. I grepped the XGBoost source code and found this utility function just for testing. https://github.com/dmlc/xgboost/blob/v2.0.3/python-package/xgboost/testing/updater.py#L12 It's quite similar what @giriprasad51 is doing. We can use get_basescore() by from xgboost.testing.updater import get_basescore but it's only available from 2.0.0.

Thanks @mshr-h ! Hmm Ok, what do you think @interesaaat ? Should we try to do this just here, or wait and have a higher level solution that's more of a broad xgb==2.0 fix? I am open to either!

Interesting. I find it weird that to access a parameter you need to save the model. Like the parameter should be stored somewhere, right? But I haven't looked into the details, so maybe this is the only way? Since it's at conversion this is probably fine, even if not ideal.

ksaur commented 3 months ago

Merging this for now. Maybe it will be steps toward a broader solution for xgb==2.0 in the future!