Closed sara-eb closed 4 years ago
@sara-eb Hello,
Thanks for reporting this issue. I believe the problem is that you are putting the RF classifier inside of a list:
pool_classifiers = [model_rf]
So when the DES clustering receives the models as input it only sees one model (as the list has only one element), instead of seeing all individual models of the RF. For that reason, it cannot properly set up the variables N and J which corresponds to the number of classifiers in the pool selected based on accuracy and diversity, respectively (they are based on a fraction of the total pool size).
Try changing that line to just:
pool_classifier = model_rf
to see if it works.
Dear @Menelau Thanks a lot for your prompt response. I have tried this before posting the issue here,
Once I changed to pool_classifier = model_rf
, DESClustering on X_dsel
takes long time, even after 24 hours does not fit, which seems there is a problem here. Then, I had to stop the code, because it is not fitting.
Moreover, since I am using the Faiss kNN, I am using this code for saving the model. Once I remove the brackets, the problem raises for other DS models. For example in the case of OLA
:
Fitting OLA on X_DSEL dataset
OLA fitting time for 5 patients in DSEL = 455.3
Saving the OLA dynamic selection model
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-19-8568481c02e9> in <module>
12 print("Saving the OLA dynamic selection model")
13 ola_model_dir = ds_model_outdir+'ola.pkl'
---> 14 save_ds(model_ola, ola_model_dir)
15
16
<ipython-input-16-0f2979a7cded> in save_ds(dsalgo, path)
28 dsalgo.roc_algorithm_.index_ = serialize_index(dsalgo.roc_algorithm_.index_)
29 with open(path, 'wb') as f:
---> 30 dill.dump(dsalgo, f)
31
32
~/deslib-env/lib/python3.6/site-packages/dill/_dill.py in dump(obj, file, protocol, byref, fmode, recurse, **kwds)
257 _kwds = kwds.copy()
258 _kwds.update(dict(byref=byref, fmode=fmode, recurse=recurse))
--> 259 Pickler(file, protocol, **_kwds).dump(obj)
260 return
261
~/deslib-env/lib/python3.6/site-packages/dill/_dill.py in dump(self, obj)
443 raise PicklingError(msg)
444 else:
--> 445 StockPickler.dump(self, obj)
446 stack.clear() # clear record of 'recursion-sensitive' pickled objects
447 return
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in dump(self, obj)
407 if self.proto >= 4:
408 self.framer.start_framing()
--> 409 self.save(obj)
410 self.write(STOP)
411 self.framer.end_framing()
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
519
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
523 def persistent_id(self, obj):
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
632
633 if state is not None:
--> 634 save(state)
635 write(BUILD)
636
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
474 f = self.dispatch.get(t)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
478
~/deslib-env/lib/python3.6/site-packages/dill/_dill.py in save_module_dict(pickler, obj)
910 # we only care about session the first pass thru
911 pickler._session = False
--> 912 StockPickler.save_dict(pickler, obj)
913 log.info("# D2")
914 return
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_dict(self, obj)
819
820 self.memoize(obj)
--> 821 self._batch_setitems(obj.items())
822
823 dispatch[dict] = save_dict
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in _batch_setitems(self, items)
845 for k, v in tmp:
846 save(k)
--> 847 save(v)
848 write(SETITEMS)
849 elif n:
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
519
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
523 def persistent_id(self, obj):
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
632
633 if state is not None:
--> 634 save(state)
635 write(BUILD)
636
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
474 f = self.dispatch.get(t)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
478
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_tuple(self, obj)
749 write(MARK)
750 for element in obj:
--> 751 save(element)
752
753 if id(obj) in memo:
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
474 f = self.dispatch.get(t)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
478
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_bytes(self, obj)
699 self.write(BINBYTES8 + pack("<Q", n) + obj)
700 else:
--> 701 self.write(BINBYTES + pack("<I", n) + obj)
702 self.memoize(obj)
703 dispatch[bytes] = save_bytes
error: 'I' format requires 0 <= number <= 4294967295
Dear @Menelau Thanks a lot for your prompt response. I have tried this before posting the issue here,
Once I changed to pool_classifier = model_rf
, DESClustering on X_dsel
takes long time, even after 24 hours does not fit, which seems there is a problem here. Then, I had to stop the code, because it is not fitting.
Moreover, since I am using the Faiss kNN, I am using this code for saving the model. Once I remove the brackets, the problem raises for other DS models. For example in the case of OLA
:
Fitting OLA on X_DSEL dataset
OLA fitting time for 5 patients in DSEL = 455.3
Saving the OLA dynamic selection model
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-19-8568481c02e9> in <module>
12 print("Saving the OLA dynamic selection model")
13 ola_model_dir = ds_model_outdir+'ola.pkl'
---> 14 save_ds(model_ola, ola_model_dir)
15
16
<ipython-input-16-0f2979a7cded> in save_ds(dsalgo, path)
28 dsalgo.roc_algorithm_.index_ = serialize_index(dsalgo.roc_algorithm_.index_)
29 with open(path, 'wb') as f:
---> 30 dill.dump(dsalgo, f)
31
32
~/deslib-env/lib/python3.6/site-packages/dill/_dill.py in dump(obj, file, protocol, byref, fmode, recurse, **kwds)
257 _kwds = kwds.copy()
258 _kwds.update(dict(byref=byref, fmode=fmode, recurse=recurse))
--> 259 Pickler(file, protocol, **_kwds).dump(obj)
260 return
261
~/deslib-env/lib/python3.6/site-packages/dill/_dill.py in dump(self, obj)
443 raise PicklingError(msg)
444 else:
--> 445 StockPickler.dump(self, obj)
446 stack.clear() # clear record of 'recursion-sensitive' pickled objects
447 return
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in dump(self, obj)
407 if self.proto >= 4:
408 self.framer.start_framing()
--> 409 self.save(obj)
410 self.write(STOP)
411 self.framer.end_framing()
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
519
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
523 def persistent_id(self, obj):
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
632
633 if state is not None:
--> 634 save(state)
635 write(BUILD)
636
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
474 f = self.dispatch.get(t)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
478
~/deslib-env/lib/python3.6/site-packages/dill/_dill.py in save_module_dict(pickler, obj)
910 # we only care about session the first pass thru
911 pickler._session = False
--> 912 StockPickler.save_dict(pickler, obj)
913 log.info("# D2")
914 return
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_dict(self, obj)
819
820 self.memoize(obj)
--> 821 self._batch_setitems(obj.items())
822
823 dispatch[dict] = save_dict
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in _batch_setitems(self, items)
845 for k, v in tmp:
846 save(k)
--> 847 save(v)
848 write(SETITEMS)
849 elif n:
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
519
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
523 def persistent_id(self, obj):
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
632
633 if state is not None:
--> 634 save(state)
635 write(BUILD)
636
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
474 f = self.dispatch.get(t)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
478
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_tuple(self, obj)
749 write(MARK)
750 for element in obj:
--> 751 save(element)
752
753 if id(obj) in memo:
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
474 f = self.dispatch.get(t)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
478
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_bytes(self, obj)
699 self.write(BINBYTES8 + pack("<Q", n) + obj)
700 else:
--> 701 self.write(BINBYTES + pack("<I", n) + obj)
702 self.memoize(obj)
703 dispatch[bytes] = save_bytes
error: 'I' format requires 0 <= number <= 4294967295
Dear @Menelau Thanks a lot for your prompt response. I have tried this before posting the issue here,
Once I changed to pool_classifier = model_rf
, DESClustering on X_dsel
takes long time, even after 24 hours does not fit, which seems there is a problem here. Then, I had to stop the code, because it is not fitting.
Moreover, since I am using the Faiss kNN, I am using this code for saving the model. Once I remove the brackets, the problem raises for other DS models. For example in the case of OLA
:
Fitting OLA on X_DSEL dataset
OLA fitting time for 5 patients in DSEL = 455.3
Saving the OLA dynamic selection model
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-19-8568481c02e9> in <module>
12 print("Saving the OLA dynamic selection model")
13 ola_model_dir = ds_model_outdir+'ola.pkl'
---> 14 save_ds(model_ola, ola_model_dir)
15
16
<ipython-input-16-0f2979a7cded> in save_ds(dsalgo, path)
28 dsalgo.roc_algorithm_.index_ = serialize_index(dsalgo.roc_algorithm_.index_)
29 with open(path, 'wb') as f:
---> 30 dill.dump(dsalgo, f)
31
32
~/deslib-env/lib/python3.6/site-packages/dill/_dill.py in dump(obj, file, protocol, byref, fmode, recurse, **kwds)
257 _kwds = kwds.copy()
258 _kwds.update(dict(byref=byref, fmode=fmode, recurse=recurse))
--> 259 Pickler(file, protocol, **_kwds).dump(obj)
260 return
261
~/deslib-env/lib/python3.6/site-packages/dill/_dill.py in dump(self, obj)
443 raise PicklingError(msg)
444 else:
--> 445 StockPickler.dump(self, obj)
446 stack.clear() # clear record of 'recursion-sensitive' pickled objects
447 return
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in dump(self, obj)
407 if self.proto >= 4:
408 self.framer.start_framing()
--> 409 self.save(obj)
410 self.write(STOP)
411 self.framer.end_framing()
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
519
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
523 def persistent_id(self, obj):
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
632
633 if state is not None:
--> 634 save(state)
635 write(BUILD)
636
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
474 f = self.dispatch.get(t)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
478
~/deslib-env/lib/python3.6/site-packages/dill/_dill.py in save_module_dict(pickler, obj)
910 # we only care about session the first pass thru
911 pickler._session = False
--> 912 StockPickler.save_dict(pickler, obj)
913 log.info("# D2")
914 return
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_dict(self, obj)
819
820 self.memoize(obj)
--> 821 self._batch_setitems(obj.items())
822
823 dispatch[dict] = save_dict
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in _batch_setitems(self, items)
845 for k, v in tmp:
846 save(k)
--> 847 save(v)
848 write(SETITEMS)
849 elif n:
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
519
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
523 def persistent_id(self, obj):
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
632
633 if state is not None:
--> 634 save(state)
635 write(BUILD)
636
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
474 f = self.dispatch.get(t)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
478
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_tuple(self, obj)
749 write(MARK)
750 for element in obj:
--> 751 save(element)
752
753 if id(obj) in memo:
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
474 f = self.dispatch.get(t)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
478
/usr/local/python/3.6.2-static/lib/python3.6/pickle.py in save_bytes(self, obj)
699 self.write(BINBYTES8 + pack("<Q", n) + obj)
700 else:
--> 701 self.write(BINBYTES + pack("<I", n) + obj)
702 self.memoize(obj)
703 dispatch[bytes] = save_bytes
error: 'I' format requires 0 <= number <= 4294967295
@Menelau Hi again, Any update on this?
it seems the model is big and cannot be saved by dill.dump(dsalgo, f)
.
I tried to find a solution for saving i. In this link, the author is suggesting the author is suggesting to save as HDF5
file.
I am not sure if I have written the command it in a correct way or not, but I was trying to save the model `ola' as hdf
from klepto.archives import *
file_archive('model_la.pkl',ola,serialized=True)
it raise an error:
```
~/my-env/lib/python3.6/site-packages/klepto/archives.py in new(file_archive, name, dict, cached, kwds) 118 archive = _file_archive(name, kwds) 119 if cached: archive = cache(archive=archive) --> 120 archive.update(dict) 121 return archive 122
TypeError: 'OLA' object is not iterable
Do you have any idea? how can I save these big models on big datasets?
@Menelau Hi again, Any update on this?
it seems the model is big and cannot be saved by dill.dump(dsalgo, f)
.
I tried to find a solution for saving i. In this link, the author is suggesting the author is suggesting to save as HDF5
file.
I am not sure if I have written the command it in a correct way or not, but I was trying to save the model `ola' as hdf
from klepto.archives import *
file_archive('model_la.pkl',ola,serialized=True)
it raise an error:
```
~/my-env/lib/python3.6/site-packages/klepto/archives.py in new(file_archive, name, dict, cached, kwds) 118 archive = _file_archive(name, kwds) 119 if cached: archive = cache(archive=archive) --> 120 archive.update(dict) 121 return archive 122
TypeError: 'OLA' object is not iterable
Do you have any idea? how can I save these big models on big datasets?
@Menelau Hi again, Any update on this?
it seems the model is big and cannot be saved by dill.dump(dsalgo, f)
.
I tried to find a solution for saving i. In this link, the author is suggesting the author is suggesting to save as HDF5
file.
I am not sure if I have written the command it in a correct way or not, but I was trying to save the model `ola' as hdf
from klepto.archives import *
file_archive('model_la.pkl',ola,serialized=True)
it raise an error:
```
~/my-env/lib/python3.6/site-packages/klepto/archives.py in new(file_archive, name, dict, cached, kwds) 118 archive = _file_archive(name, kwds) 119 if cached: archive = cache(archive=archive) --> 120 archive.update(dict) 121 return archive 122
TypeError: 'OLA' object is not iterable
Do you have any idea? how can I save these big models on big datasets?
Hello,
If the model is too big I believe that hdf5 is a better option since it is made for large and complex data format. However, I'm not familiar with klepto and the whole HDF5 saving process. I think your best bet would be checking with the h5py repository the https://github.com/h5py/h5py
I have trained the random forest on
X_train
in advance and load the model to create the pool of classifier:It seems that other DS methods (i.e., OLA, MLA, DESP, and etc) are successfully fitting the
X_dsel
on the DS models, however DESClustering is raising the value error:which is raised on this line
Why this is happening when a Random Forest is given as a pool to
DESClustering
?Is it because I have a pre-trained RF model and loading it as the pool? Is there any difference between loading a pre-trained classifier and training the pool with RF model (as it can be seen here
Your expert opinion is really appreciated. Thanks