Open ekerazha opened 9 years ago
I stumbled upon this issue while working on the same Kaggle competition (but using c#, so the issue is likely present there as well).
After some tinkering, I found out that it's caused by columns subject to normalization that contain solely integer values. When I tweaked the CSV accordingly (added .0 to the integer values, but I suspect you only need to change a single value like that for each of the offending columns for it to work), it worked like charm again.
I'm a bit busy at the moment (no time to trawl through the code and find the cause of this) and quite new to this Git stuff (I registered just to report this), so I probably won't be able to fix this, though.
I'm looking at this Kaggle competition: http://www.kaggle.com/c/titanic-gettingStarted/data
This is the training file: http://www.kaggle.com/c/titanic-gettingStarted/download/train.csv (CSV)
I tried to use this simple code (mostly taken from the NormalizeFile example):
Output is:
Look at the "parch" column:
If you look at this line of the training file:
we have that "parch" is definitely 5.
If I change the normalization method for the "parch" column from Equilateral to Normalize
it still fails to detect the max value.
I also tried
because I thought it could fail to find the max value because of missing values, but it still fails to detect the max value, I always get "min=0.0,max=0.0".
P.S. It also wants to normalize "survived" and "sex" as "OneOf", but we only have 2 values (0/1, male/female), so I think that "SingleField" normalization could also be a good choice (I can change this, however it uses 0/1 instead of the full -1/1 range for the SingleField values... I don't know if it works this way by design...).