About "tcmalloc: large alloc" message and training aborted

jd-coderepos commented 3 years ago

Greetings,

Thank you for this great tool!

I am trying to train a new model on my data. My data file is 14.8MB and has 981,163 lines, 6 unique tags, and uses the IOBES tagging scheme. With my training attempt, the training aborts prematurely. I share here the screenshot with the message before it aborts.

Also, shared below is a screenshot of the end of my training file as my data sample.

train-data

Would making batches of the data help? And how would one go about it if so?

Happy to know a resolution to this.

Many thanks in advance!

jiesutd commented 3 years ago

You can refer to this: https://github.com/huggingface/transformers/issues/4668

jd-coderepos commented 3 years ago

Thank you for the reply. I managed to work out the problem. Sharing below my experience.

I tried running it on my local machine with 40GB ram and the issue persisted. It asked for a significantly large RAM which seemed unreasonable.

I then ran the code on the sample data and looked in the DATA SUMMARY STAT section where it showed the Label alphabet size: 18. But as in my screenshot below, from the run on my data, my Label alphabet size is almost the same as the Word alphabet size.

The problem was in my test data. I had not attached default O labels to the words. So perhaps the program was considering the words as the labels themselves. As an example, the former test file contained data as below:

Quantitative
Regular
Expressions
for
Arrhythmia
Detection
Algorithms

Logics
and
Games
for
True
Concurrency

And I fixed this with:

Quantitative O
Regular O
Expressions O
for O
Arrhythmia O
Detection O
Algorithms O

Logics O
and O
Games O
for O
True O
Concurrency O

This then showed an accurate Label alphabet size: 30 and no memory issues. :)

jiesutd commented 3 years ago

Awesome! Thanks for sharing this knowledge.

jiesutd / NCRFpp

About "tcmalloc: large alloc" message and training aborted #171