Open ddovod opened 7 years ago
We are currently working on a better documentation. As for now I added a quick draft that should help you.
Feel free to ask any questions.
We will keep this issue open until we have a better documentation.
Yes, this is what I'm looking for! I'm the very beginner in this things and any info is useful. And thank you guys for this project, seems to be very clean and useful!
Another question about performance. Is it okay if supervised learning lasts a lot of time with simple iris dataset on a core i5 6600 cpu? Didn't wait till it finish, but it lasted for about 15+ minutes
Which Dataset?
Can you send us your categorization code?
It definitely shouldn't take that long.
@ddovod If you're using Visual Studio, did you compile in Debug or Release mode? Running Hippocrates in Debug mode will reduce its performance by a big margin. Just re-compile in Release mode and try it again.
I'm using cmake and after set(CMAKE_BUILD_TYPE Release) it runs faster on simple tasks, but on the iris dataset it still lasts for a long time. I'm using classic dataset from here https://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/datasets/iris.csv, just replacing last column with digital class marks. My code is:
void loadIris(Training::Data<IrisResult>& trData, Training::Data<IrisResult>& testData)
{
std::vector<std::vector<float>> data;
std::ifstream file("iris.csv");
std::string buf;
while (std::getline(file, buf)) {
data.push_back({});
std::stringstream ss(buf);
float val;
while (ss >> val) {
data.back().push_back(val);
if (ss.peek() == ',')
ss.ignore();
}
}
std::random_shuffle(data.begin(), data.end());
for (int i = 0; i < data.size() * 0.8; i++) {
Training::Data<IrisResult>::Set set;
set.input = std::vector<float>(data[i].begin(), data[i].end() - 1);
set.classification = static_cast<IrisResult>(std::round(data[i].back()));
trData.AddSet(set);
}
for (int i = data.size() * 0.8; i < data.size(); i++) {
Training::Data<IrisResult>::Set set;
set.input = std::vector<float>(data[i].begin(), data[i].end() - 1);
set.classification = static_cast<IrisResult>(std::round(data[i].back()));
testData.AddSet(set);
}
}
int main()
{
Training::Data<IrisResult> trData;
Training::Data<IrisResult> testData;
loadIris(trData, testData);
Training::NeuralNetworkTrainer trainer;
auto champ = trainer.TrainSupervised(trData, 150);
std::cout << "Finished training in " << trainer.GetGenerationsPassed() << " generations\n";
std::cout << "Result: " << Tests::TestingUtilities::TestNetwork(champ, testData) << std::endl;
return 0;
}
It lasts for a 20 minutes or so and doesn.t finished yet.
I've been trying this code on a reduced version of this dataset (20 random objects for training, 10 - for testing), and this is my output:
ddovod@/build: time ./neat
Finished training in 4693 generations
Result: 1
real 1m2.466s
user 0m42.032s
sys 0m3.176s
Maybe I'm doing something wrong? Thank you a lot!
I'm gonna look at this in more detail today, but at first glance it seems that your inputs are not between -1.0 and 1.0, which is asumed by our library. The intended usage would be to divide by the theorecally highest value, like in here.
But uppon thinking about this I decided that this is not a clean solution and your code should be able to work. I'm gonna change the lib accordingly in the next few hours (#77). I would appreciate it if you could wait a moment and not change your code so you can beta test the new feature.
Yes, of course, I can check it on the evening today
I have divided all the values by 10.0 except the class marks and it still consumes a lot of time. For reduced dataset (20 training/10 testing) I have the following output:
ddovod@/build: time ./neat
Finished training in 3417 generations
Result: 1
real 0m48.722s
user 0m48.696s
sys 0m0.012s
Is it okay to produce 3417 generations for this dataset? Maybe you have some numbers for classic problems, i.e. "For binary classification with 100 training objects 500 generations should be enough"? It will be very helpful
Thank you for your feedback, know that it means a lot to us!
I have just updated the development branch so your original code without division should work.
Would you mind sharing your new results with us? Let's hope they're a bit faster this time :)
If there are no visible improvements I'm gonna implement #81 and then compare the results.
Ok, it works much faster now! Reduced dataset (faster and without errors):
ddovod@/build: time ./neat
Finished training in 285 generations
Result: 0
real 0m0.587s
user 0m0.584s
sys 0m0.000s
But with full dataset (120/30) there's a lot of wrong answers (maybe it is overfitting, afaik ANNs are very overfittable for linear classification tasks), but it runs significant faster ("Result" here is the number of bad predictions on the test data):
ddovod@/build: time ./neat
Finished training in 1183 generations
Result: 28
real 0m20.655s
user 0m20.652s
sys 0m0.000s
There's one more issue with NeuralNetwork class. My compiler is g++-6.2, and I compilation error:
[ 68%] Building CXX object CMakeFiles/neat.dir/src/main.cpp.o
In file included from /home/ddovod/_private/_ml/practice/neat/src/main.cpp:3:
In file included from /home/ddovod/_private/_ml/practice/neat/Hippocrates/Tests/TestingUtilities/Sources/Headers/testing_utilities.hpp:7:
In file included from /home/ddovod/_private/_ml/practice/neat/Hippocrates/Core/Sources/Headers/training/neural_network_trainer.hpp:6:
/home/ddovod/_private/_ml/practice/neat/Hippocrates/Core/Sources/Headers/trained/classifier.hpp:11:17: error: call to implicitly-deleted default constructor of 'Hippocrates::Trained::NeuralNetwork'
Classifier() : NeuralNetwork(){ };
^
/home/ddovod/_private/_ml/practice/neat/Hippocrates/Core/Sources/Headers/trained/neural_network.hpp:6:23: note: default constructor of 'NeuralNetwork' is implicitly deleted because base class 'Phenotype::NeuralNetwork' has no default constructor
class NeuralNetwork : public Phenotype::NeuralNetwork {
^
1 error generated.
It can be fixed by adding the default ctor and then it works fine
Sorry, my bad, the right full dataset result:
ddovod@/build: time ./neat
Finished training in 439 generations
Result: 0
real 0m0.961s
user 0m0.936s
sys 0m0.000s
Looks like it works fine! Thank you a lot! I will further experiment with it and maybe will ask some dummy questions here, is it okay?)
Oh wow, these results really make me proud :)
I will look to add your dataset as an integration test.
Is it fine if I use some of the code that you provided in your snipped?
We are all more then happy if you experiment around and ask silly questions. We still did not invite beta testers and so require a lot of beginner feedback.
If you have any questions on the usability or find parts of the library to be confusing, please ask.
Thank you for helping us.
This test looks ideal for the project. I would like to use your code in our tests, if you are ok with that.
You could also open a pull request if you want to add the test yourself.
Yes, sure, thank you a lot) I'm seeing some stranger things and will be investigating it. So I'll return with results a bit later in this week. Your project is very interesting for me, it almost the only maintainable NEAT-related project on the github, so I will glad if can be useful for it. And I can open a pull-resuest with iris dataset and related classification test tomorrow
Ok guys, another question. Why did you restricted this library usage only for c++1z? There's no much place where it's really needed, and maybe c++14 would be enough, and it has good support in gcc and clang (and libc++)
Here just a few things I cannot use to work with Hippocrates:
experimental
directory, that's why it's sad a bitAnd maybe it was my mistake, but it still spends a lot of time on my initial problem. I don't know why, but today I cleaned/compiled the test again and it spends a lot of time and memory https://travis-ci.org/ddovod/Hippocrates/jobs/179871731 Have no idea about the reason
As you can see here, C++1z is just as well supported.
libc++ has indeed support for experimental, you just have to build it yourself and provide a certain build parameter, which is a pain in the ass. Perhaps we will change logging and JSON parsing to use well tested libraries that do not use experimental TS, which would eliminate those dependencies.
CLion has very deprecated syntax highlighting since 3 years and is not planned to be supported, as this would bottleneck our coding style heavily.
Hi! I have some problem with understanding of unsipervised learning api (IBody class). Could you please provide some information about it? Tutorial section or documentation for this class would be nice! Thank you a lot!