microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.34k stars 278 forks source link

hummingbird/ml/convert.py does not support csr_matrix #638

Closed hguan6 closed 1 year ago

hguan6 commented 2 years ago

hummingbird/ml/convert.py", line 305, in _convert_common and (is_spark_dataframe(test_input) or len(test_input) > 0) triggers an error raise TypeError("sparse matrix length is ambiguous; use getnnz()" in https://github.com/scipy/scipy/blob/942426af487a7e9b51fbc059c36ac5f69186d032/scipy/sparse/_base.py#L345 when the input is a Scipy csr_matrix.

interesaaat commented 2 years ago

Thanks for reporting this. Can you please provide us a complete example?

hguan6 commented 2 years ago

A complete example is here: https://github.com/asu-cactus/netsdb/blob/27c2e75e2015e4d1f59d9c8678b744bddc4bbf4e/model-inference/decisionTree/experiments/test_model.py#L159 It works fine when features is a NumPy array, but when the features is a csr_matrix, the above error will be triggered.

interesaaat commented 2 years ago

Can you please try if it works using this branch?

hguan6 commented 1 year ago

Thank you for your reply. I ran it with this new "convert.py" file, but I still got an error.

Traceback (most recent call last):
  File "/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py", line 334, in <module>
    test(args, features, label, sklearnmodel, config, time_consume)
  File "/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py", line 122, in test
    test_postprocess(*test_cpu(*argv))
  File "/home/ubuntu/netsdb/model-inference/decisionTree/experiments/test_model.py", line 167, in test_cpu
    model = convert_to_hummingbird_model(sklearnmodel, "tvm", features, args.batch_size, "cpu")
  File "/home/ubuntu/netsdb/model-inference/decisionTree/experiments/model_helper.py", line 85, in convert_to_hummingbird_model
    model = convert(model, backend, batch_data, device=device, extra_config=extra_config)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py", line 444, in convert
    return _convert_common(model, backend, test_input, device, extra_config)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py", line 405, in _convert_common
    return _convert_sklearn(model, backend_formatted, test_input, device, extra_config)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py", line 111, in _convert_sklearn
    hb_model = topology_converter(topology, backend, test_input, device, extra_config=extra_config)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_topology.py", line 352, in convert
    batch_trace_input, remainder_trace_input = _get_trace_input_from_test_input(test_input, remainder_size, extra_config)
  File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_topology.py", line 134, in _get_trace_input_from_test_input
    trace_input = torch.from_numpy(input)
TypeError: expected np.ndarray (got csr_matrix)

It seems like torch does not support conversion from csr_matrix to torch tensor.

interesaaat commented 1 year ago

Of course. Can you try to make the sparse matrix dense and pass it to Hummingbird? I think it it works this would be the best option. Otherwise you can try with something like this but I am not sure if this will break something later on again.

hguan6 commented 1 year ago

Okay. Thank you.