Closed clausnizer-ondics closed 3 years ago
If I infer correctly you must first use some heuristic and categorise your data into sick / not sick. Maybe add a column called sick where 0 = not sick and 1 = sick? Your heuristic could be age > x and blood pressure > y then sick as a simplified example.
Thanks for your help. If this will give me a working example it would be fine! :-)
I know how to add a column sick
to the diabetes.csv
but I have no Idea where to put the heuristic stuff. ;-)
Can you provide a step by step guide on how to do this?
Actually I think the column you are looking for is 'outcome' not 'sick'. Use that as your target instead.
@Anenizer Hi, the data that I used in the docs can be found in this repo. Please go to the examples folder and then check the datasets under the data folder. Or simply click here.
Now coming to your issue. Notice that there are multiple versions of the famous Indian-diabetes dataset. If you visit kaggel, you will find many version of it, each having different attributes/feature names. The one I'm using here has an attribute called sick, which indicates whether a patient sick or not (0 means not sick and 1 means sick). The trick is if you are using a dataset with other attribute names then you will have to provide what you want to predict in the target field inside the .yaml file. Simply put, if the name of the attribute in your dataset is let's say "patient-status" instead of sick, then you have to provide:
target:
- patient-status
in your .yaml file. This way igel will recognize that you want to predict the patient-status from your dataset. Hope this was helpful ;)
@Anenizer does this answer your question? if not feel free to re-open the issue or create a new one if you have other questions
Yes, I now have a working example, this was my goal. Thank you very much!
Description
Very new to ML, don't know what and how to do something with the Igel. I followed the Quick-Start Demo to get an Idea.
Resulted in this igel.yaml:
What I Did
... with having a big question mark above my head:
If I understand right, the Igel want's to have a column named
sick
indataset.csv
. So there is a missing link and I have no idea how to close this.Can you provide test-data, maybe as part of this repo, to get something to work? Or help me finding the missing part?
Please help