mindsdb / lightwood

Lightwood is Legos for Machine Learning.
GNU General Public License v3.0
450 stars 94 forks source link

[Bug] ICP softmax type error #1096

Closed paxcema closed 1 year ago

paxcema commented 1 year ago

Your Environment

Describe your issue

torch.softmax is erroring due to underlying str_ type in the tensor (which is quite weird):

 File "/MindsDB/lightwood/lightwood/helpers/log.py", line 30, in wrap
    result = f(predictor, *args, **kw)
  File "/var/folders/_0/b5mtgvvs71gdbp7ftlyjdz340000gp/T/ff0526104935eccd8528c8d87b0b9045ab325dc013a3b2c7167468801567233.py", line 443, in learn
    self.analyze_ensemble(enc_train_test)
  File "/MindsDB/lightwood/lightwood/helpers/log.py", line 30, in wrap
    result = f(predictor, *args, **kw)
  File "/var/folders/_0/b5mtgvvs71gdbp7ftlyjdz340000gp/T/ff0526104935eccd8528c8d87b0b9045ab325dc013a3b2c7167468801567233.py", line 393, in analyze_ensemble
    self.model_analysis, self.runtime_analyzer = model_analyzer(
  File "/MindsDB/lightwood/lightwood/analysis/analyze.py", line 88, in model_analyzer
    runtime_analyzer = block.analyze(runtime_analyzer, **kwargs)
  File "/MindsDB/lightwood/lightwood/analysis/nc/calibrate.py", line 210, in analyze
    icps[tuple(group)].calibrate(icp_df.values, y)
  File "/MindsDB/lightwood/lightwood/analysis/nc/icp.py", line 102, in calibrate
    cal_scores = self.nc_function.score(self.cal_x, self.cal_y)
  File "/MindsDB/lightwood/lightwood/analysis/nc/nc.py", line 407, in score
    prediction = self.model.predict(x)
  File "/MindsDB/lightwood/lightwood/analysis/nc/base.py", line 165, in predict
    return t_softmax(self.prediction_cache, t=0.5)
  File "/MindsDB/lightwood/lightwood/analysis/nc/util.py", line 13, in t_softmax
    return softmax(torch.Tensor(x) / t, dim=axis).numpy()
TypeError: can't convert np.ndarray of type numpy.str_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

How can we replicate it?

CREATE MODEL  
    mindsdb.TS_R1S1_SELL
FROM files  
    (SELECT SEQ, S1, CLOSE, SELL FROM files.TS_R1S1_TRAIN_DATA)
PREDICT SELL
ORDER BY SEQ 
GROUP BY S1
WINDOW 4
HORIZON 2 
sunnysktsang commented 1 year ago

As a workaround, I would like to confirm that the error can be bypassed by removing the "GROUP BY" clause and using the "use_default_analysis=False" in the query. (Thanks for @paxcema's advice)

CREATE MODEL  
    mindsdb.TS_R1S1_SELL
FROM files  
    (SELECT SEQ, S1, CLOSE, SELL FROM files.TS_R1S1_TRAIN_DATA)
PREDICT SELL
ORDER BY SEQ 
-- GROUP BY S1
WINDOW 4
HORIZON 2 
USING
use_default_analysis=False;

I'm enclosing datasets using date/time instead of numbers for the SEQ column for the ordering. TS_R1S1_TRAIN_DATA_DT.csv TS_R1S1_TEST_DATA_DT.csv