Closed jd-coderepos closed 3 years ago
You can refer to this: https://github.com/huggingface/transformers/issues/4668
Thank you for the reply. I managed to work out the problem. Sharing below my experience.
I tried running it on my local machine with 40GB ram and the issue persisted. It asked for a significantly large RAM which seemed unreasonable.
I then ran the code on the sample data and looked in the DATA SUMMARY STAT
section where it showed the Label alphabet size: 18.
But as in my screenshot below, from the run on my data, my Label alphabet size
is almost the same as the Word alphabet size.
The problem was in my test data. I had not attached default O
labels to the words. So perhaps the program was considering the words as the labels themselves. As an example, the former test file contained data as below:
Quantitative
Regular
Expressions
for
Arrhythmia
Detection
Algorithms
Logics
and
Games
for
True
Concurrency
And I fixed this with:
Quantitative O
Regular O
Expressions O
for O
Arrhythmia O
Detection O
Algorithms O
Logics O
and O
Games O
for O
True O
Concurrency O
This then showed an accurate Label alphabet size: 30
and no memory issues. :)
Awesome! Thanks for sharing this knowledge.
Greetings,
Thank you for this great tool!
I am trying to train a new model on my data. My data file is 14.8MB and has 981,163 lines, 6 unique tags, and uses the IOBES tagging scheme. With my training attempt, the training aborts prematurely. I share here the screenshot with the message before it aborts.
Also, shared below is a screenshot of the end of my training file as my data sample.
Would making batches of the data help? And how would one go about it if so?
Happy to know a resolution to this.
Many thanks in advance!