linkedin / dagli

Framework for defining machine learning models, including feature generation and transformations, as directed acyclic graphs (DAGs).
BSD 2-Clause "Simplified" License
354 stars 40 forks source link

Add some xgboost internals access #3

Closed cyberbeat closed 3 years ago

cyberbeat commented 3 years ago

I'd like to get more informations for the trained xgboost model (Booster) like:

Map<String,Integer> | getFeatureScore(String[] featureNames) Get importance of each feature with specified feature names.

Could you add some Api to access the booster directly, or delegate some more methods?

jeffpasternack commented 3 years ago

Hi--thanks for your suggestion!

This is actually already possible using something called "transformer views", which allow you to look at the prepared/trained transformer (in this case, the trained XGBoost model, expressed as an XGBoostClassification.Prepared object, which has a public getBooster()) method for this purpose. However, we haven't written documentation for views yet (other than Javadoc), and this is certainly non-obvious without an example. I should be able to add one to the project and provide it to you by the end of the day--I'll add another comment here when it's ready.

cyberbeat commented 3 years ago

oh, thanks, I think I got it, it really was not so easy:

PreparedTransformerView<com.linkedin.dagli.xgboost.XGBoostClassification.Prepared<?>> preparedTransformerView = new PreparedTransformerView<>(xgboost);
DAG1x1<P, com.linkedin.dagli.xgboost.XGBoostClassification.Prepared<?>> dag1 = DAG.withPlaceholder(p).withOutput(preparedTransformerView);
Result<P, com.linkedin.dagli.xgboost.XGBoostClassification.Prepared<?>> res1 =dag1.prepareAndApply(testData);
Iterator<com.linkedin.dagli.xgboost.XGBoostClassification.Prepared<?>> c1 = res1.iterator();
..

right?

Btw I also tested some dl4j model, but stumbled upon some jvm crashes there

jeffpasternack commented 3 years ago

Yes, that's indeed correct (you'll just need to pull the transformer from the iterator and call getBooster()). We often try to make this a bit easier by adding convenience methods (e.g. an asBooster() method) that allow clients to avoid having to create the view themselves; I'll go ahead and add such a method (again, please expect it by EOD).

FYI, the result of a view is always a constant; this means you can get the prepared transformer in a slightly more clean way by calling dag1..prepare(...).getConstantResult().

Could you please share more regarding the JVM crashes with DL4J?

cyberbeat commented 3 years ago

These both: https://github.com/eclipse/deeplearning4j/issues/9148 and https://github.com/eclipse/deeplearning4j/issues/8977 So no backend was usable for me (CPU/GPU). The model was similar to your CharLstm-Example, with some extra features added to the dense layer.

jeffpasternack commented 3 years ago

FYI the example code has been added as SimpleTransformerViewExample. I've also added a convenience method XGBoostClassification::asBooster() that creates the view for you, although this obviously won't make it into the repository JARs until we push a new version (since you already have written the "manual" solution I'm guessing this isn't a priority for you :) ).

With respect to the first DL4J error, I'll try to reproduce with the CharLSTM model in a similar Linux/CUDA context (this might take a while). DeepLearning4J is beta and we (Dagli) are using their data reader API in an "advanced" way; the exception could be due to a bug in DL4J (which we could possibly work around, as we have other bugs), or (less likely, given that we've not seen this problem before) it could be due to some subtle flaw in our DL4J integration.

With respect to the second error, this looks like it's probably a bug in the ND4J C++ internals which could be difficult to work around. It's possible that it's exacerbated by Dagli's creation of multidimensional arrays in multiple threads or the generally high number of threads in the process. Would you mind trying the (non-multithreaded) SimpleDAGExecutor? Usage would be like dag.withExecutor(new SimpleDAGExecutor()).prepare(...).

Thanks for both reports!

cyberbeat commented 3 years ago

I tried SimpleDAGExecutor, and it did not crash after training this time.

Thanks for your work on this great framework, it really improves the experience in this context for java. I hope java community will pick it up, and more "state of the art" special layers will be added (to dl4j).

jeffpasternack commented 3 years ago

Thanks for checking! This suggests that the problem is indeed likely within ND4J; if it were strictly due to MD arrays being created in multiple threads we could work around this, but from other people's reports it sounds like the mere presence of other threads in the process is more likely; if we're able to reproduce we can investigate further, but hopefully the DL4J team will have this fixed in their next release :)

If there are any layer types you're interested in that are in DL4J but not a part of Dagli's abstraction, please let us know--it would be great to get community feedback to help guide us here.

cyberbeat commented 3 years ago

Thanks for investigating. And thanks for adding the getBooster-method.

Just for understanding: would it be also possible to get the XGBoostClassification.Prepared from the dag via the "producers​(java.lang.Class producerClass)" method (like dag.producers(XGBoostClassification.Prepared.class).findAny().get().peek())?

jeffpasternack commented 3 years ago

This would work, but is not recommended except for debugging. In this case it's probably safe because it's unlikely an XGBoostClassification.Prepared instance is going to be optimized away in any real-world problem, but in general it's not a good idea to depend on assumptions about the specific nodes comprising a DAG unless you disable graph optimizations (also not recommended :) ). The reason for this is that Dagli might "optimize away" the node you're looking for or otherwise rewrite the graph; for example, if your XGBoostClassification model had exclusively constant-value inputs in the DAG, it could (and would) be replaced by a constant itself and your code that looks for an XGBoostClassification.Prepared instance in the prepared DAG would break.

Instead, the right way to do this is with views: e.g. use a PreparedTransformerView with your XGBoostClassification as its input, and then make that view an output of your DAG. The output value will then be the prepared transformer you're looking for (your XGBoostClassification.Prepared instance).

cyberbeat commented 3 years ago

With respect to the first DL4J error, I'll try to reproduce with the CharLSTM model in a similar Linux/CUDA context (this might take a while). DeepLearning4J is beta and we (Dagli) are using their data reader API in an "advanced" way; the exception could be due to a bug in DL4J (which we could possibly work around, as we have other bugs), or (less likely, given that we've not seen this problem before) it could be due to some subtle flaw in our DL4J integration.

Could you reproduce it in the meantime? I think this post: https://community.konduit.ai/t/bertiterator-produces-npe-while-training-on-gpu/580 hints to the right direction.

Training on CPU really takes very long (several days), so I would be happy to get it run on the GPU.

jeffpasternack commented 3 years ago

The tricky part is reproducing it in a way that it's easy to debug. My original plan for reproducing this was inside a Linux VM with CUDA CPU emulation, but apparently this has ceased to be an option. Since I have local access to a CUDA-capable Windows machine I'll try using this--if the issue is in DL4J's Java layer the bug will likely be cross-platform.

In the meantime, if you're not already using it, you may want to try the AVX variant of the DL4J/ND4J CPU library, which takes advantage of SIMD. It still won't be nearly as fast as a GPU, unfortunately :(

jeffpasternack commented 3 years ago

Quick update: we've been able to reproduce the issue; it is indeed the same DL4J bug from the post you've linked to. We also have a workaround that we're going to be testing later today. I'll update the ticket again once we've published the new version to Maven Central (assuming the tests succeed, I expect this will happen no later than tomorrow).

jeffpasternack commented 3 years ago

The workaround has been shipped in Dagli 15.0.0-beta6, which should be available from Maven Central in a few hours. As the original issue (and now this "bonus" issue :) ) have been addressed, I'll close the issue--thanks again for reporting these!