stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.67k stars 2.7k forks source link

[MEMORY] Possibly use float instead of double in models/weights #1003

Closed lambdaupb closed 3 years ago

lambdaupb commented 4 years ago
double

double arrays are a large portion of the heap.

There are some places with 2d double arrays with dimensions like

345k x 16, 150k x 24, 80k x 46: CRFCLassifier.weights 100k x 1000: Classifier.saved in DependecyParser 60k x 50: Classifier.E, .eg2E 1000x2400: Classifier.W1, .wg2W1

Most are weights of some sort, making me wonder if they could be stored in less than 64bit each.

The obvious step would be to use float[], halving the memory use of this portion.

Another would be to encode weights in something else, for example a small integer and scale that into a float again when using the weight.

Machine Learning models often use fp16 or even fp8 to store weights, there are java implementations of float -> short -> float (with fp16 semantics stored in a 16bit short)

like https://android.googlesource.com/platform/frameworks/base/+/master/core/java/android/util/Half.java with https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java

or https://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java

The latter approached would need some performance testing as each time a weight is used it would have to be converted first.


I saw that some models serialize themselves using ObjectStreams, that would need an adapter to deserialize to double[] first and then array-cast it to float[].

Like in CRFClassifier.loadClassifier

AngledLuffa commented 4 years ago

I could see experiments with 32 bit values. The SR Parser, for example, already uses 32 bits and it didn't lose much accuracy. Going lower than that seems very unlikely.

On Fri, Feb 28, 2020 at 6:07 AM lambdaupb notifications@github.com wrote:

[image: double] https://user-images.githubusercontent.com/1890613/75553593-feef8580-5a38-11ea-895f-8c050d048f8d.png

double arrays are a large portion of the heap.

There are some places with 2d double arrays with dimensions like

345k x 16, 150k x 24, 80k x 46: CRFCLassifier.weights 100k x 1000: Classifier.saved in DependecyParser 60k x 50: Classifier.E, .eg2E 1000x2400: Classifier.W1, .wg2W1

Most are weights of some sort, making me wonder if they could be stored in less than 64bit each.

The obvious step would be to use float[], halving the memory use of this portion.

Another would be to encode weights in something else, for example a small integer and scale that into a float again when using the weight.

Machine Learning models often use fp16 or even fp8 to store weights, there are java implementations of float -> short -> float (with fp16 semantics stored in a 16bit short)

like https://android.googlesource.com/platform/frameworks/base/+/master/core/java/android/util/Half.java with https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/libcore/util/FP16.java

or https://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java

The latter approached would need some performance testing as each time a weight is used it would have to be converted first.

I saw that some models serialize themselves using ObjectStreams, that would need an adapter to deserialize to double[] first and then array-cast it to float[].

Like in CRFClassifier.loadClassifier

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1003?email_source=notifications&email_token=AA2AYWP74G276P6T3MGQG2TRFELDRA5CNFSM4K5Q7VJKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IRD4NUQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWLLQ3GFZBGWXZCZUSTRFELDRANCNFSM4K5Q7VJA .

lambdaupb commented 4 years ago

My experimental Float based Classifier saves around 900MiB and model loading and saving still uses doubles.

https://github.com/lambdaupb/CoreNLP/commit/23f17fe2bffe09a2a59e996798b9a00e9906db41

The change is not tested though. Additionally, there are places where temporary accumulators would be better done in double and only cast down to float after the calculations to limit float imprecision.

AngledLuffa commented 4 years ago

I think that after the next release, which hopefully happens in a week or two, we will look into converting the models to using doubles during training saved to floats. That seems unlikely to hurt accuracy too much, especially if intermediate calculations are still done with doubles, as you suggest. Unfortunately I expect this will not make it into the next release.

Thanks again for all of the suggestions!

AngledLuffa commented 4 years ago

Now that 4.0.0 is released, I'm free to pick this up again. I've already (hopefully) cut down the memory footprint of sutime quite a bit by converting sets with 1 element to singletons.

Regarding floats vs doubles, were you finding that floats instead of doubles were most valuable in the nndep module or in the ner module? I started working on converting ner to using floats, but if it's more impactful to do it in the dependency parser, I can work on that instead.

lambdaupb commented 4 years ago

The retained size computations in VisualVM take quite some time, so looking at my earlier rough figures

345k x 16, 150k x 24, 80k x 46: CRFCLassifier.weights 100k x 1000: Classifier.saved in DependecyParser 60k x 50: Classifier.E, .eg2E 1000x2400: Classifier.W1, .wg2W1

I'd point to 100k x 1000 first as it is ~100mio doubles with around 800MiB.

lambdaupb commented 4 years ago

The difference of tokenize,ssplit,pos,lemma,ner to tokenize,ssplit,pos,lemma,ner,depparse is an additional 900MiB (going from 975MiB to 1850MiB), so depparse seems close to the estimated usage.

double[] memory use goes from 63M to 951M

lambdaupb commented 4 years ago

tokenize,ssplit,pos,lemma,ner,depparse,coref seems to add ~300M unrelated to double[] and tokenize,ssplit,pos,lemma,ner,depparse,coref,quote adds almost 1000M of doubles[] again.

AngledLuffa commented 4 years ago

Yikes. Okay, each of quote, depparse, ner have some serious work to do, perhaps in that order.

One issue is that I personally don't know how to evaluate the quote annotator, and maybe I know how to evaluate the depparse annotator, but I definitely know how to evaluate the ner annotator. The result is that I'll actually go in the opposite order: ner, depparse, quote

Thanks for checking.

lambdaupb commented 4 years ago

The commit https://github.com/lambdaupb/CoreNLP/commit/23f17fe2bffe09a2a59e996798b9a00e9906db41 saves around 900M with the tokenize,ssplit,pos,lemma,ner,depparse,coref,quote pipeline, so that change would have to affect quote and depparse since depparse alone would only account for around 400M. So quote is probably re-using the nndep Classifier code.

AngledLuffa commented 4 years ago

That sounds excellent. My concerns are:

serializing floats instead of doubles will save significantly on read/write time how to eval and make sure the results are still good enough? I'll consult my teammates who know more about those exact annotators

lambdaupb commented 4 years ago

There also seem to be two serialization methods, one with java serialize and one text based writer: https://github.com/lambdaupb/CoreNLP/commit/23f17fe2bffe09a2a59e996798b9a00e9906db41#r38704278

The text based one will probably not save much.

AngledLuffa commented 4 years ago

The text based serializers are simply for sanity checking when developing the models, so I am not concerned about that either way. Again, thank you for your attention to this issue. I will hopefully having something posted on git master by tomorrow - I am strongly incentivized to do so, since my meeting with my PI is next Monday!

lambdaupb commented 4 years ago

Glad I could help, best of luck!

AngledLuffa commented 4 years ago

It would seem that the NER models are not the source of the bloat, or at least, converting the weights array to floats is not the answer :(

Test results on the existing models produce exactly the same results using doubles or floats, and retraining a model and saving it as floats also produces exactly the same results. I expected maybe one or two changes, but nope, none at all. The memory footprint is only 25MB smaller, though. If you have the time and the inclination, would you verify that the changes work but are not as impactful as we hoped? If there's some other data structure that is taking up a lot of space in the CRFClassifier, I'll be happy to look at that as well.

Thanks!

lambdaupb commented 4 years ago

The ner decrease is in line with my estimations yesterday, 63/2 and 25M are very close.

The main memory culprits are the nndep and quote classifier models. With potential savings of around 500M each.

AngledLuffa commented 4 years ago

Sounds great. Guess I'm hunting the big game today.

AngledLuffa commented 4 years ago

It looks like the problem with the quote annotator is that it loads depparse even if it doesn't need to. Maybe I can fix that and put "cut corenlp memory usage by 33%" on my weekly summary

lambdaupb commented 4 years ago

well, the memory increases by 1000M by adding quote to a pipeline that already loads depparse. Is it loading the same models again?

AngledLuffa commented 4 years ago

If I run

java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators "tokenize,ssplit,pos,parse,lemma,ner,coref,depparse,quote"

it outputs this:

[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator
depparse
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading
depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Done reading
from disk ... Time elapsed: 2.1 sec
[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99990,
Elapsed Time: 11.342 (s)
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing
dependency parser ... done [13.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator
quote
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading
depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Done reading
from disk ... Time elapsed: 7.2 sec
[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99990,
Elapsed Time: 14.197 (s)
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing
dependency parser ... done [21.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Finished loading
pipeline.  Current memory usage: 3155mb

I can run it with just depparse, or just quote, and either way save about 900mb.

AngledLuffa commented 4 years ago

This is making me wish for c++ style templates, btw. If I could do Classifier<float> at run time and Classifier<double> at train time, this would be a trivial task

lambdaupb commented 4 years ago

Why does quote not look in the pipeline if the depparser is already there and use that or at least cache the models? The user should not have to know about this at all.

AngledLuffa commented 4 years ago

Why does quote not look in the pipeline if the depparser is already there and use that or at least cache the models? The user should not have to know about this at all.

Beats me!

The pipeline doesn't pass earlier annotators in the sequence to later annotators, but even so, there could have been some mechanism to reuse models. I added a map which should reuse the existing parser. Want to verify that it works? Should save quite a bit

AngledLuffa commented 4 years ago

The main problem I'm finding with the float/double difference in nndep/Classifier.java is that the models is trained using the same data structures it uses as runtime. Eg W1, W2, E, all get updated at training. The NER models optimize a function on a large array of doubles and then reshape that array into the model parameters, which made it extremely easy to optimize.

In fact I'm not sure about two things:

Removing all the training code would certainly make it a lot easier to convert the model to floats instead of doubles

AngledLuffa commented 4 years ago

It looks like the weight matrices in depparse are a tiny portion of the overall memory usage. The largest chunk of it is in the pre-multiplied array. That also takes a long time to calculate at startup, so what I did was shrink that down quite a bit to 20000 and add an LRU which saves the most recent 5000 multiplications. The LRU stores floats, not doubles. There are parameters to control the sizes of those two sets. For example:

-depParse.numCached 20000 -depparse.numPreComputed 0

If you put both to 0, depparse slows to a crawl, so I can't recommend doing that. You can, though, if you want to use as little memory as possible.

Converting the model itself from doubles to floats is possible, but doesn't save that much and would require investigating the effects on training.

For now I think I'll call this good, although if you have more suggestions for large chunks that can be reduced, I'm definitely interested in continuing to reduce the memory footprint of corenlp.

Certainly templates can be misused. However, if you can think of a better way to do this than templates (in a typed language, obviously), I'm all ears:

  /**
   * Add the two 1d arrays in place of {@code to}.
   *
   * @throws java.lang.IllegalArgumentException If {@code to} and {@code
from} are not of the same dimensions
   */
  public static void pairwiseAddInPlace(double[] to, double[] from) {
    if (to.length != from.length) {
      throw new IllegalArgumentException("to length:" + to.length + " from
length:" + from.length);
    }
    for (int i = 0; i < to.length; i++) {
      to[i] += from[i];
    }
  }

  /**
   * Add the two 1d arrays in place of {@code to}.
   *
   * @throws java.lang.IllegalArgumentException If {@code to} and {@code
from} are not of the same dimensions
   */
  public static void pairwiseAddInPlace(double[] to, float[] from) {
    if (to.length != from.length) {
      throw new IllegalArgumentException("to length:" + to.length + " from
length:" + from.length);
    }
    for (int i = 0; i < to.length; i++) {
      to[i] += from[i];
    }
  }
AngledLuffa commented 3 years ago

(update: eventually tested and switched depparse to floats as well)