Closed chenglin closed 2 years ago
TLDR: It's possible that there's a bug that causes a segfault, though it's unlikely that this is happening in the parts of the code you're pointing to.
For diagnosing the segfault: Could you run a minimally reproducing example with gdb
to see which instruction triggers the segfault? There used to be an issue with overflows for very large datasets, but I fixed that a few months ago. If there's any way you can have a self-contained, minimally reproducible sample and send it to me (email is fine), I'd love to help you out.
Regarding the categorical data: The relevant function is actually this one: https://github.com/siboehm/lleaves/blob/9784625d8503c02e2679fafefb41c469b345566d/lleaves/compiler/codegen/codegen.py#L42
This is the function in the binary that lleaves calls from Python (using two double
pointers). The categorical features are then cast to ints in the core loop here:https://github.com/siboehm/lleaves/blob/9784625d8503c02e2679fafefb41c469b345566d/lleaves/compiler/codegen/codegen.py#L205
Most of the processing of the Pandas dataframes follows LightGBM very closely. This double
to int
casting is a bit strange, but I wanted to follow LightGBM as closely as possible. It works since LightGBM doesn't allow categoricals > 2^31-1 (max int 32), but double can represent any int up to 2^53 and lower without loss of precision.
I find that if categorical feature is numerical value, we can get rid of the code df[categorical_feature] = df[categorical_feature].astype('category')
when prepared training data. We can just call lightgbm train function by set param categorical_feature=categorical_feature
. In model file trained like this, pandas_categorical is null. May this issue related to this?
When I retrained a model that pandas_categorical is not null, the core dump disappeared.
PR: return empty list if pandas_categorical is null in model file
BTW, I think we show keep pandas_categorical = None
, when pandas_categorical: null
in the model file.
I'm having trouble understanding this issue. Could you write up a minimally reproducible example of the core dump / send me the model.txt
that causes it?
Recently, I find that one of my model will cause core dump if I use lleaves for predict.
I am confused about two functions below.
In codegen.py, function param type can be
int*
if param is categoricalBut in data_processing.py with predict used, all feature param are convert to
double*
Is this just like
Does this will happy in lleaves?