nitlev / deepforest

A very simple implementation of the deepforest method
MIT License
7 stars 1 forks source link

The problem about test dataSet #1

Open luochuyao opened 7 years ago

luochuyao commented 7 years ago

i 'd like to thank author for completing the deepforest programing firstly,However ,i met some problems about testing dataset.Actually ,i‘m newcomers to python. i followed the suggestion in readme.md.after installing and testing i didn' t find the result that i want (as you know ,the accuracy). when i check your source code I find some variable of x_train 、x_test、y_train and y_test ,and these are generated randomly.So ,I replace these variable by MINIST dataSet. However ,I also meet some errors. i don't kown what is correct method.So I hope you can give me some tips .this is my question summery 1、where should i put test dataSet? 2、Do i need modifiy what you coding as what I described? 3、if you have more time,would you mind give me some materials to learn and understand the processor of running . I am novice at these thing,So I very very hoping that you can help me

Looking forward to your reply

nitlev commented 7 years ago

First of all, thanks for your feedback !

The project is still under development, so it might not meet all your requirements yet. For starter, I still haven't implemented a way to handle image datasets (which will be done soon), you you might not be able to use this code to make predictions on your MNIST dataset.

I will upload some update soon, so hopefully you will be able to take advantage of this project in the next few days/weeks. In the meantime, I welcome any feedback/suggestions.

Thanks again !

luochuyao commented 7 years ago

I am very gald that you can reply so quikly.I already knew how should i test you program after i read unit test .Meantimes, I found that your program only can solve the problem of two classification. I read your test-dataset and I notice the y_test'label(y_train'label) is 0 or 1, So i guess your program is designed to solve the problem of two classfication.if I am right,can I could test imdb testset?

After I browse your program carefully , I still have some confuse about some tips?I notice you using function fit(),but I don't the which using class have this funtion . I guess that it may be relates to the roc_auc_score ,but I am not sure.So I might understand your code more easiler if you do some notes.

I hope I don't disturbe you and these question wouldn't make trouble to you. hoping your help

dingtiandu commented 7 years ago

hi I want to train my own data which has 500000 training data.But I have a OOB score error During the Random Forests training . You say"A potential solution consists in using cross validation instead of OOB score although it slows down the training. Anyway, simply increasing the number of trees and re-running the training (and crossing fingers) is often enough." I have set gcf = gcForest(shape_1X=16, window=8, tolerance=0.0, min_samples_mgs=80, min_samples_cascade=80) and still got OOB score error.Do I still need to increase the number of trees? Hope applying.