sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
1.99k stars 203 forks source link

Ability to run distribution analysis regarding to a target feature #835

Open borisRa opened 2 years ago

borisRa commented 2 years ago

Hi,

Is it possible to run distribution analysis regarding to a target feature ? For example in Titanic data, to show how "Survived" is affected by each variable.

For example here we can see how 'Survived'is affected by "Age"- per train/test

image

Thanks, Boris

jinglinpeng commented 2 years ago

Hi @borisRa , thanks for the suggestion! yeah we do plan to add more support for the scenario as you mentioned. May I know what's the meaning of the line the fig. Is it the survived rate?

borisRa commented 2 years ago

Hi @borisRa , thanks for the suggestion! yeah we do plan to add more support for the scenario as you mentioned. May I know what's the meaning of the line the fig. Is it the survived rate?

yes , this is the survived rate

datatalking commented 2 years ago

@jinglinpeng @borisRa I'd like to help with this feature, should this be an ensemble choice at the start of the mathematical process or just start with a singular target feature?

Given passenger id and/or survived find P(Y|X1) ? Y = survived X1 = Pclass X2 = Name and so on thru Xn? Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked