Closed gaetschwartz closed 3 years ago
@gaetschwartz thank you for creating the issue, I'll take a look at this
Hello everyone, this happens to me too. Follow my sample data:
Q1_1,Q1_2,Q1_3,Q1_4,Q2_1,Q3_1,Q3_2,Q3_3,Q3_4,Q3_5,Q3_6,Q3_7,Q3_8,Q3_9,Q3_10,Q3_11,Q3_12,Q3_13,Q3_14,Q4_1,Q4_2,Q4_3,Q4_4,Q4_5,Q4_6,Q4_7,Q4_8,Q4_9,Q4_10,Q4_11,Q4_12,Q4_13,MORT 2,4,4,4,3,3,1,2,2,3,5,6,2,3,5,5,1,2,2,2,2,2,3,5,1,1,3,3,4,1,1,1,I 2,1,1,1,3,3,2,4,4,5,5,6,2,3,5,7,1,2,2,2,3,4,4,7,1,1,3,3,2,3,1,1,I 2,3,4,1,3,3,1,2,4,2,3,6,2,3,5,8,1,2,2,2,3,4,4,7,1,1,3,4,1,2,1,1,O 2,1,1,1,3,1,3,4,2,3,1,6,1,3,2,2,1,2,2,5,3,4,4,7,1,1,3,5,4,2,1,1,M 2,3,4,1,3,2,1,2,4,4,3,6,2,3,5,5,1,2,2,3,3,4,4,7,1,1,3,2,3,1,4,1,O 2,1,1,1,3,2,3,2,3,4,1,6,2,3,5,4,1,2,2,3,3,4,4,7,1,1,2,2,3,1,1,1,M 2,2,3,1,3,3,3,4,3,5,5,6,2,3,5,8,1,2,2,2,3,4,4,7,2,1,3,3,2,1,1,1,O 1,3,3,1,1,2,2,1,4,3,4,6,2,2,2,5,1,2,2,3,3,4,4,7,1,1,3,3,4,1,2,1,O 2,1,1,1,3,2,3,4,4,3,5,6,2,2,5,5,1,2,2,3,3,4,4,7,1,1,3,3,4,1,2,1,O 2,1,1,1,3,2,1,2,4,3,3,6,2,2,5,7,1,2,1,3,3,4,3,7,2,1,3,3,3,2,1,1,I 2,2,4,1,3,4,2,1,3,2,1,6,2,3,2,8,1,2,2,2,3,4,4,7,2,1,3,3,4,1,1,1,O 2,1,1,1,4,3,1,1,4,6,4,6,2,2,5,5,2,2,2,3,3,4,4,7,2,1,3,3,4,1,1,1,I 1,1,1,1,1,2,2,4,4,5,5,2,1,3,2,6,1,2,2,2,3,4,4,7,2,1,3,4,2,1,1,1,I 2,3,4,1,3,2,3,2,3,3,4,6,2,3,2,7,1,2,2,3,3,4,4,7,2,1,3,3,3,1,1,1,M 2,1,1,1,3,3,2,2,2,5,5,6,1,3,5,9,1,2,2,4,3,4,4,7,2,1,3,2,2,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,3,1,1,O 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,3,1,1,I 2,3,4,3,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,4,2,3,1,2,I 2,3,1,1,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,4,2,3,1,2,I 2,3,4,4,3,3,2,3,4,1,1,1,2,2,2,6,1,1,2,3,3,2,2,6,1,1,2,3,3,3,1,2,I 2,2,4,1,3,2,1,2,3,5,1,6,2,3,2,7,1,2,1,3,3,4,3,6,2,1,2,3,4,1,1,1,I 2,1,1,1,3,2,3,4,3,4,1,6,2,3,5,5,1,2,2,4,3,4,4,6,1,1,2,3,3,1,1,1,M 2,2,1,4,1,2,1,1,4,4,5,4,2,3,3,7,1,2,2,2,3,4,3,7,1,1,3,5,4,2,1,1,M 2,1,1,1,1,3,1,2,4,4,5,5,2,3,3,7,1,2,2,2,3,4,3,7,1,1,3,5,5,1,1,1,O 3,3,1,4,3,3,1,2,3,5,4,5,2,3,3,9,1,2,2,2,3,4,3,6,1,1,1,1,3,1,1,1,I 3,1,1,1,3,4,2,2,3,5,4,6,2,2,1,9,1,2,2,2,3,4,4,7,1,1,3,4,5,1,1,1,M 4,1,1,1,3,4,2,2,3,4,2,6,2,3,3,9,1,2,2,2,3,4,4,7,1,1,3,2,3,1,1,1,M 3,2,1,1,3,4,1,1,2,4,4,2,2,2,2,3,1,1,2,2,3,4,3,7,1,1,3,4,4,2,1,1,I 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,2,2,1,1,1,O 2,3,4,3,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,4,4,1,1,2,I 2,3,3,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,5,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,5,3,1,1,2,M 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,5,5,1,1,2,M 2,3,4,3,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,4,2,1,2,I 2,2,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,4,1,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,5,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,5,3,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,5,3,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,4,2,1,2,I 2,3,4,4,3,4,1,2,3,2,2,4,2,3,5,6,1,1,2,3,3,4,4,7,2,1,2,6,3,2,1,2,I 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,3,1,2,O 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,3,1,2,M 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,2,1,2,I 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,2,1,2,O 2,3,4,4,3,3,2,1,2,1,4,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,2,2,1,2,O 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,1,1,2,O 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,1,1,2,I 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,2,1,2,I 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,1,1,2,O 2,4,4,4,3,3,2,1,2,1,1,3,2,3,5,4,1,2,2,2,3,4,4,7,2,1,2,5,4,1,1,2,O 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,5,1,1,1,O 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,4,1,1,1,O 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,3,1,1,1,O 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,4,1,1,1,M 2,1,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,5,1,1,1,M 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,3,4,3,6,1,1,3,3,5,1,1,1,M 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,2,3,3,6,1,1,3,4,5,1,1,1,O 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,2,3,3,6,1,1,3,2,2,1,1,1,O 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,2,3,3,6,1,1,3,4,5,1,1,1,O 2,2,1,1,3,5,2,1,3,1,5,6,2,3,2,9,1,2,2,2,2,3,3,6,1,1,3,4,5,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,2,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 2,1,1,1,3,2,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,4,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,4,4,3,1,1,M 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,1,1,1,O 2,1,1,1,3,4,2,2,3,4,3,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,3,3,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,2,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,3,3,1,1,I 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,3,1,1,1,O 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,3,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,3,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,2,1,1,I 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,3,1,1,I 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,2,1,1,M 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,3,1,1,I 4,2,2,1,3,4,3,2,3,4,4,6,2,2,1,9,1,2,2,2,2,2,3,4,2,1,3,5,4,3,1,1,I
This is my code:
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_dataframe/ml_dataframe.dart';
Future<void> main(List<String> arguments) async {
final samples = await fromCsv('./bin/questionario.csv', headerExists: true);
final targetColumnName = 'MORT';
final splits = splitData(samples, [0.6]);
final validationData = splits[0];
final testData = splits[1];
final validator = CrossValidator.kFold(validationData, numberOfFolds: 5);
final createClassifier = (DataFrame samples) => LogisticRegressor(
samples,
targetColumnName,
optimizerType: LinearOptimizerType.gradient,
iterationsLimit: 90,
learningRateType: LearningRateType.decreasingAdaptive,
batchSize: samples.rows.length,
probabilityThreshold: 0.7,
);
final scores =
await validator.evaluate(createClassifier, MetricType.accuracy);
final accuracy = scores.mean();
print('accuracy on k fold validation: ${accuracy.toStringAsFixed(2)}');
final testSplits = splitData(testData, [0.8]);
final classifier = createClassifier(testSplits[0]);
final finalScore = classifier.assess(testSplits[1], MetricType.accuracy);
print(finalScore.toStringAsFixed(2));
await classifier.saveAsJson('diabetes_classifier.json');
}
This is the error message:
Unhandled exception: Invalid argument(s)
package:ml_linalg/…/data_manager/float32_matrix_data_manager.dart:37
package:ml_linalg/…/matrix/matrix_factory_impl.dart:21
package:ml_linalg/matrix.dart:42
package:ml_dataframe/…/data_frame/data_frame_impl.dart:151
package:ml_algo/…/cross_validator/cross_validator_impl.dart:31
bin/reg_log.dart:21
@jrcodev ok, thank you very much for the feedback, I'll fix that soon
@gaetschwartz you're trying to use LogisticRegressor for multiclass classification problem, but LogisticRegressor can only be used for binary classification. Please use SoftmaxRegressor instead.
@jrcodev you need to preprocess your data first, since you have strings in your dataset. To do so please refer to https://github.com/gyrdym/ml_preprocessing library. And also you have the same problem as @gaetschwartz has - please use SoftmaxRegressor to classify your records, since you have more than two classes
Thanks I will try this
@jrcodev you're welcome, don't hesitate to ask me if you face any troubles connected to data preprocessing
@jrcodev you're welcome, don't hesitate to ask me if you face any troubles connected to data preprocessing
I am totally new to this subject, I received a project to create classifications. In reality my data comes from firebase, I am in doubt about the json format that Dataframe.fromJson
accepts. I tried to understand how the Dataframe.fromJson
function works, but because of codegen I couldn't
@jrcodev DataFrame.fromJson
restores previously created dataframe - that's my fault, I should've named it more clearly, e.g. DataFrame.restoreFromJson
, I suggest you to convert your data into list of rows, and don't forget about a header of your dataset - either add the header as the first row to the list of rows or specify parameter header
. I definitely need to add some docs to https://github.com/gyrdym/ml_dataframe lib
@gyrdym Alright, correct. I was probably tired. Another thing, when creating a SoftmaxRegressor
, the second argument is targetNames
. But why can't it take only one targetName ?
@gaetschwartz the thing is that you need to encode your target column first, since it may contain raw data (e.g. string labels of classes), but SoftmaxClassifier
can only deal with numeric data. Usually, the target class column turns into several columns after encoding (e.g., after one-hot encoding) - the exact number of columns is equal to the number of classes. Please, refer to https://github.com/gyrdym/ml_preprocessing#one-hot-encoding for more information. Thank you very much for writing this, I'll definitely add some documentation on this to ml_algo lib since it looks a bit vague.
@gaetschwartz @jrcodev hi everyone, is there anything I can help you with? Are the problems discussed above still relevant?
The problem was solved
Here is my data :
And it then throws this exception while trying try to create the classifier:
Classifier is constructed this way :