gyrdym / ml_algo

Machine learning algorithms in Dart programming language
https://gyrdym.github.io/ml_algo/
BSD 2-Clause "Simplified" License
188 stars 32 forks source link

Blank invalid exception while creating classifier #176

Closed gaetschwartz closed 3 years ago

gaetschwartz commented 3 years ago

Here is my data :

(src, day, time, dest)
(0, 0, 450, 4)
(1, 0, 110, 5)
(0, 1, 450, 4)
(1, 1, 110, 5)
(0, 2, 450, 4)
(1, 2, 110, 5)
(0, 3, 450, 4)
(1, 3, 110, 5)
(0, 4, 450, 4)
(1, 4, 110, 5)
(0, 5, 450, 4)
(1, 5, 110, 5)
(2, 6, 660, 6)
(3, 6, 1170, 7)
(0, 0, 450, 4)
(1, 0, 110, 5)
(0, 1, 450, 4)
(1, 1, 110, 5)
(0, 2, 450, 4)
(1, 2, 110, 5)
(0, 3, 450, 4)
(1, 3, 110, 5)
(0, 4, 450, 4)
(1, 4, 110, 5)
(0, 5, 450, 4)
(1, 5, 110, 5)
(2, 6, 660, 6)
(3, 6, 1170, 8)

And it then throws this exception while trying try to create the classifier:

Unhandled exception:
Invalid argument(s)
#0      _TypedList._setFloat32 (dart:typed_data-patch/typed_data_patch.dart:2126:36)
#1      _Float32ArrayView.[]= (dart:typed_data-patch/typed_data_patch.dart:4461:16)
#2      new Float32MatrixDataManager.fromList
package:ml_linalg/…/data_manager/float32_matrix_data_manager.dart:37

#3      MatrixFactoryImpl.fromList
package:ml_linalg/…/matrix/matrix_factory_impl.dart:21
#4      new Matrix.fromList
package:ml_linalg/matrix.dart:42
#5      DataFrameImpl.toMatrix
package:ml_dataframe/…/data_frame/data_frame_impl.dart:143
#6      createLogLikelihoodOptimizer
package:ml_algo/…/_helpers/create_log_likelihood_optimizer.dart:46

#7      LogisticRegressorFactoryImpl.create
package:ml_algo/…/logistic_regressor/logistic_regressor_factory_impl.dart:58
#8      new LogisticRegressor
package:ml_algo/…/logistic_regressor/logistic_regressor.dart:153
#9      main.<anonymous closure>
bin\knn.dart:41
#10     main
bin\knn.dart:53
<asynchronous suspension>

Classifier is constructed this way :

 final createClassifier = (DataFrame samples) => LogisticRegressor(
        samples,
        targetColumnName,
        optimizerType: LinearOptimizerType.gradient,
        iterationsLimit: 90,
        learningRateType: LearningRateType.decreasingAdaptive,
        batchSize: samples.rows.length,
        probabilityThreshold: 0.7,
      );
gyrdym commented 3 years ago

@gaetschwartz thank you for creating the issue, I'll take a look at this

jose-almir commented 3 years ago

Hello everyone, this happens to me too. Follow my sample data:

Q1_1,Q1_2,Q1_3,Q1_4,Q2_1,Q3_1,Q3_2,Q3_3,Q3_4,Q3_5,Q3_6,Q3_7,Q3_8,Q3_9,Q3_10,Q3_11,Q3_12,Q3_13,Q3_14,Q4_1,Q4_2,Q4_3,Q4_4,Q4_5,Q4_6,Q4_7,Q4_8,Q4_9,Q4_10,Q4_11,Q4_12,Q4_13,MORT 2,4,4,4,3,3,1,2,2,3,5,6,2,3,5,5,1,2,2,2,2,2,3,5,1,1,3,3,4,1,1,1,I 2,1,1,1,3,3,2,4,4,5,5,6,2,3,5,7,1,2,2,2,3,4,4,7,1,1,3,3,2,3,1,1,I 2,3,4,1,3,3,1,2,4,2,3,6,2,3,5,8,1,2,2,2,3,4,4,7,1,1,3,4,1,2,1,1,O 2,1,1,1,3,1,3,4,2,3,1,6,1,3,2,2,1,2,2,5,3,4,4,7,1,1,3,5,4,2,1,1,M 2,3,4,1,3,2,1,2,4,4,3,6,2,3,5,5,1,2,2,3,3,4,4,7,1,1,3,2,3,1,4,1,O 2,1,1,1,3,2,3,2,3,4,1,6,2,3,5,4,1,2,2,3,3,4,4,7,1,1,2,2,3,1,1,1,M 2,2,3,1,3,3,3,4,3,5,5,6,2,3,5,8,1,2,2,2,3,4,4,7,2,1,3,3,2,1,1,1,O 1,3,3,1,1,2,2,1,4,3,4,6,2,2,2,5,1,2,2,3,3,4,4,7,1,1,3,3,4,1,2,1,O 2,1,1,1,3,2,3,4,4,3,5,6,2,2,5,5,1,2,2,3,3,4,4,7,1,1,3,3,4,1,2,1,O 2,1,1,1,3,2,1,2,4,3,3,6,2,2,5,7,1,2,1,3,3,4,3,7,2,1,3,3,3,2,1,1,I 2,2,4,1,3,4,2,1,3,2,1,6,2,3,2,8,1,2,2,2,3,4,4,7,2,1,3,3,4,1,1,1,O 2,1,1,1,4,3,1,1,4,6,4,6,2,2,5,5,2,2,2,3,3,4,4,7,2,1,3,3,4,1,1,1,I 1,1,1,1,1,2,2,4,4,5,5,2,1,3,2,6,1,2,2,2,3,4,4,7,2,1,3,4,2,1,1,1,I 2,3,4,1,3,2,3,2,3,3,4,6,2,3,2,7,1,2,2,3,3,4,4,7,2,1,3,3,3,1,1,1,M 2,1,1,1,3,3,2,2,2,5,5,6,1,3,5,9,1,2,2,4,3,4,4,7,2,1,3,2,2,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,3,1,1,O 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,3,1,1,I 2,3,4,3,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,4,2,3,1,2,I 2,3,1,1,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,4,2,3,1,2,I 2,3,4,4,3,3,2,3,4,1,1,1,2,2,2,6,1,1,2,3,3,2,2,6,1,1,2,3,3,3,1,2,I 2,2,4,1,3,2,1,2,3,5,1,6,2,3,2,7,1,2,1,3,3,4,3,6,2,1,2,3,4,1,1,1,I 2,1,1,1,3,2,3,4,3,4,1,6,2,3,5,5,1,2,2,4,3,4,4,6,1,1,2,3,3,1,1,1,M 2,2,1,4,1,2,1,1,4,4,5,4,2,3,3,7,1,2,2,2,3,4,3,7,1,1,3,5,4,2,1,1,M 2,1,1,1,1,3,1,2,4,4,5,5,2,3,3,7,1,2,2,2,3,4,3,7,1,1,3,5,5,1,1,1,O 3,3,1,4,3,3,1,2,3,5,4,5,2,3,3,9,1,2,2,2,3,4,3,6,1,1,1,1,3,1,1,1,I 3,1,1,1,3,4,2,2,3,5,4,6,2,2,1,9,1,2,2,2,3,4,4,7,1,1,3,4,5,1,1,1,M 4,1,1,1,3,4,2,2,3,4,2,6,2,3,3,9,1,2,2,2,3,4,4,7,1,1,3,2,3,1,1,1,M 3,2,1,1,3,4,1,1,2,4,4,2,2,2,2,3,1,1,2,2,3,4,3,7,1,1,3,4,4,2,1,1,I 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,2,2,1,1,1,O 2,3,4,3,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,4,4,1,1,2,I 2,3,3,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,5,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,5,3,1,1,2,M 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,5,5,1,1,2,M 2,3,4,3,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,4,2,1,2,I 2,2,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,4,1,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,5,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,5,3,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,5,3,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,4,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,3,2,1,2,I 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,3,1,2,O 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,3,1,2,M 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,2,1,2,I 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,2,1,2,O 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,2,1,2,O 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,1,1,2,O 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,1,1,2,I 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,2,1,2,I 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,1,1,2,O 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,1,1,2,O 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,5,1,1,1,O 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,4,1,1,1,O 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,3,1,1,1,O 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,4,1,1,1,M 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,5,1,1,1,M 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,5,1,1,1,M 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,2,3,3,6,1,1,3,4,5,1,1,1,O 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,2,3,3,6,1,1,3,2,2,1,1,1,O 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,2,3,3,6,1,1,3,4,5,1,1,1,O 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,2,3,3,6,1,1,3,4,5,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,2,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 2,1,1,1,3,2,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,4,4,3,1,1,M 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,2,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,3,3,1,1,I 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,3,1,1,1,O 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,3,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,3,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,2,1,1,I 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,3,1,1,I 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,3,1,1,I 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,3,1,1,I

This is my code:

import 'package:ml_algo/ml_algo.dart';
import 'package:ml_dataframe/ml_dataframe.dart';

Future<void> main(List<String> arguments) async {
  final samples = await fromCsv('./bin/questionario.csv', headerExists: true);
  final targetColumnName = 'MORT';
  final splits = splitData(samples, [0.6]);
  final validationData = splits[0];
  final testData = splits[1];
  final validator = CrossValidator.kFold(validationData, numberOfFolds: 5);
  final createClassifier = (DataFrame samples) => LogisticRegressor(
        samples,
        targetColumnName,
        optimizerType: LinearOptimizerType.gradient,
        iterationsLimit: 90,
        learningRateType: LearningRateType.decreasingAdaptive,
        batchSize: samples.rows.length,
        probabilityThreshold: 0.7,
      );
  final scores =
      await validator.evaluate(createClassifier, MetricType.accuracy);
  final accuracy = scores.mean();

  print('accuracy on k fold validation: ${accuracy.toStringAsFixed(2)}');

  final testSplits = splitData(testData, [0.8]);
  final classifier = createClassifier(testSplits[0]);
  final finalScore = classifier.assess(testSplits[1], MetricType.accuracy);

  print(finalScore.toStringAsFixed(2));

  await classifier.saveAsJson('diabetes_classifier.json');
}

This is the error message:

Unhandled exception: Invalid argument(s)

0 _TypedList._setFloat32 (dart:typed_data-patch/typed_data_patch.dart:2106:36)

1 _Float32ArrayView.[]= (dart:typed_data-patch/typed_data_patch.dart:4296:16)

2 new Float32MatrixDataManager.fromList

package:ml_linalg/…/data_manager/float32_matrix_data_manager.dart:37

3 MatrixFactoryImpl.fromList

package:ml_linalg/…/matrix/matrix_factory_impl.dart:21

4 new Matrix.fromList

package:ml_linalg/matrix.dart:42

5 DataFrameImpl.toMatrix

package:ml_dataframe/…/data_frame/data_frame_impl.dart:151

6 CrossValidatorImpl.evaluate

package:ml_algo/…/cross_validator/cross_validator_impl.dart:31

7 main

bin/reg_log.dart:21

#8 _startIsolate. (dart:isolate-patch/isolate_patch.dart:299:32) #9 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:168:12)
gyrdym commented 3 years ago

@jrcodev ok, thank you very much for the feedback, I'll fix that soon

gyrdym commented 3 years ago

@gaetschwartz you're trying to use LogisticRegressor for multiclass classification problem, but LogisticRegressor can only be used for binary classification. Please use SoftmaxRegressor instead.

gyrdym commented 3 years ago

@jrcodev you need to preprocess your data first, since you have strings in your dataset. To do so please refer to https://github.com/gyrdym/ml_preprocessing library. And also you have the same problem as @gaetschwartz has - please use SoftmaxRegressor to classify your records, since you have more than two classes

jose-almir commented 3 years ago

Thanks I will try this

gyrdym commented 3 years ago

@jrcodev you're welcome, don't hesitate to ask me if you face any troubles connected to data preprocessing

jose-almir commented 3 years ago

@jrcodev you're welcome, don't hesitate to ask me if you face any troubles connected to data preprocessing

I am totally new to this subject, I received a project to create classifications. In reality my data comes from firebase, I am in doubt about the json format that Dataframe.fromJson accepts. I tried to understand how the Dataframe.fromJson function works, but because of codegen I couldn't

gyrdym commented 3 years ago

@jrcodev DataFrame.fromJson restores previously created dataframe - that's my fault, I should've named it more clearly, e.g. DataFrame.restoreFromJson, I suggest you to convert your data into list of rows, and don't forget about a header of your dataset - either add the header as the first row to the list of rows or specify parameter header. I definitely need to add some docs to https://github.com/gyrdym/ml_dataframe lib

gaetschwartz commented 3 years ago

@gyrdym Alright, correct. I was probably tired. Another thing, when creating a SoftmaxRegressor, the second argument is targetNames. But why can't it take only one targetName ?

gyrdym commented 3 years ago

@gaetschwartz the thing is that you need to encode your target column first, since it may contain raw data (e.g. string labels of classes), but SoftmaxClassifier can only deal with numeric data. Usually, the target class column turns into several columns after encoding (e.g., after one-hot encoding) - the exact number of columns is equal to the number of classes. Please, refer to https://github.com/gyrdym/ml_preprocessing#one-hot-encoding for more information. Thank you very much for writing this, I'll definitely add some documentation on this to ml_algo lib since it looks a bit vague.

gyrdym commented 3 years ago

@gaetschwartz @jrcodev hi everyone, is there anything I can help you with? Are the problems discussed above still relevant?

jose-almir commented 3 years ago

The problem was solved