lewisacidic / scikit-chem

A high level cheminformatics package for the Scientific Python stack, built on RDKit.
http://scikit-chem.readthedocs.io/en/latest/index.html
Other
63 stars 13 forks source link

skchem.data.converter module refactor #34

Open lewisacidic opened 8 years ago

lewisacidic commented 8 years ago

The converter module should be refactored to offer more flexibility.

It might be nice to be able to add features after the package is generated.

This could be done by allowing a Converter to set itself up from a HDF5 file, rather than making it from scratch.

e.g. with preexisting data.h5, extra features and splits could be added as:

conv = Converter(..., output_path='data.h5')
conv.features += skchem.descriptors.MorganFingerprinter()
conv.splits += pd.Series([True, False ...])

This would probably be easiest once we have unique string representations for featurizers.

lewisacidic commented 8 years ago

The string representation of features in #35 would be helpful for this.