yinlou / mltk

Machine Learning Tool Kit
BSD 3-Clause "New" or "Revised" License
136 stars 74 forks source link

Need for programmatic setup of datasets #14

Closed lukehutch closed 8 years ago

lukehutch commented 8 years ago

Is there any way to build datasets in memory, using the API, or do I need to write out a file to disk, and read it back in?

I tried creating a dataset using the API, but the methods and constructors of Attribute are not visible, so I can't create a List, so I can't create an Instances object, so I can't create cross-validation folds.

yinlou commented 8 years ago

Yes. Note Attribute is an abstract class, you can't create an instance of it, but you can create a NumericalAttribute or NominalAttribute. Remember to set the class attribute. For cross validations, you can use InstancesSplitter under mltk.core.processor.

lukehutch commented 8 years ago

OK, thanks for the explanation, I didn't see those concrete classes. Would be good to have documentation about this use case.