google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.25k stars 1.17k forks source link

coredump when build with CXXFLAG `-Wp,-D_GLIBCXX_ASSERTIONS` #966

Closed samchugit closed 8 months ago

samchugit commented 9 months ago

coredump when build and test sentencepiece 0.1.99 with CXXFLAG -Wp,-D_GLIBCXX_ASSERTIONS

This is when it crashed. image

This is gdb back trace. image

I wonder if this line will access out of bounds of the vector. https://github.com/google/sentencepiece/blob/3863f7648e5d8edb571ac592f3ac4f5f0695275a/src/unigram_model_trainer.cc#L236

taku910 commented 9 months ago

Thank you for the report. Wil fix it in the next release.

taku910 commented 8 months ago

Fixed in v0.2.0

Henry-ZHR commented 8 months ago

I still face coredump with v0.2.0.

Seems the same place?

Henry-ZHR commented 8 months ago

@taku910