mindsdb / lightwood

Lightwood is Legos for Machine Learning.
GNU General Public License v3.0
450 stars 94 forks source link

Better error messages when target column dtype is not supported #1095

Open paxcema opened 1 year ago

paxcema commented 1 year ago

Motivated by an example where a short_text target column returns as error:

Please specify a custom accuracy function for output type short_text.
Sumanth077 commented 1 year ago

Hi @paxcema I can help with this. Kindly let me know if you are not working on this issue currently.

paxcema commented 1 year ago

Hey @Sumanth077 — I don't think there's anyone working on this at the moment, so your help would definitely be appreciated here. Let's discuss first, though. Do you have a rough idea on how to tackle this?

I think we may need to store a map of mixers and their respective supported target data types, to then check whether the final JsonAI is capable of tackling the target dtype.

Sumanth077 commented 1 year ago

Yeah, that would be a great idea @paxcema. Making a list of Mixers we have and their supported target data types.

Can we just start doing that with the 11 Mixers currently available in Mixers category?

paxcema commented 1 year ago

Yes, I think you could start by adding the supported target data types for these as an attribute, maybe as an attribute in BaseMixer that is overridden in the specific __init__ of each one.

Then, when building either the code or the predictor itself out of a JsonAI object, we can check whether the target data type in the dtype_dict is contained in all mixers' lists of supported data types, as well as the ensemble that will use them. And if it's not contained, then we can raise an informative error. To be precise, this would happen in api.high_level, for the methods code_from_json_ai, code_from_problem and predictor_from_problem.

This way, we will raise an error at "model compilation" time so to speak, which is a valuable time saving.

Does this sound good?

Sumanth077 commented 1 year ago

Sure @paxcema that sounds good. Will look into that and let you know in case of any further clarification.

Sumanth077 commented 1 year ago

Hi @paxcema I am commenting here since this conversation will give you a clear idea on my query.

As you suggested I have made the changes to the base mixer ✅ And would be great to know how to approach in raising an Informative Error.

You have mentioned we should raise an error when building either the code or the predictor itself out of a JsonAI object.

But I guess the above error "Please specify a custom accuracy function for output type short_text." is raised when creating the Json AI object itself from Problem Definition from the "generate_json_ai" function

So would be great to know

  1. When should we raise the Error?
  2. How we should approach in raising it?

Thankyou.

paxcema commented 1 year ago

I think the error should be raised somewhere within the JsonAI creation process. I recommend creating a new method and call it from api.json_ai.validate_json_ai. This method would take the JsonAI object, check the target dtype then sweep across all mixers that have been added. If any of them does not support this dtype, log an error and raise an Exception.