Low performance after migrating to 1.5.0

hugomelo commented 1 year ago

Describe the bug We were using Stanza for quiet sometime on version 1.2 and a month ago we have upgraded it to version 1.5.0 keeping the same computer settings. We've noticed a considerable change in performance after that and thought it was not normal. Did you notice the same performance degradation or have any steps to avoid this kind of issue?

Environment (please complete the following information):

OS: Ubuntu 20.4
Python version: 3.8.10
Stanza version: 1.5.0

AngledLuffa commented 1 year ago

Do you mean it's slower or less accurate? Which language in general?

On Mon, Jun 19, 2023, 11:18 AM Hugo Melo @.***> wrote:

Describe the bug We were using Stanza for quiet sometime on version 1.2 and a month ago we have upgraded it to version 1.5.0 keeping the same computer settings. We've noticed a considerable change in performance after that and thought it was not normal. Did you notice the same performance degradation or have any steps to avoid this kind of issue?

Environment (please complete the following information):

OS: Ubuntu 20.4

Python version: 3.8.10

Stanza version: 1.5.0

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1259, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWM3HJYCJJRKPBKMXI3XMCJYBANCNFSM6AAAAAAZMISEFQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hugomelo commented 1 year ago

It's being slower in general. We are using these 17 languages at the moment :/ English, Danish, Dutch, Italian, Polish, Portuguese, Spanish, Swedish, Simplified Chinese, French, German, Norwegian, Hindi, Finnish, Arabic, Russian, Turkish

AngledLuffa commented 1 year ago

Several of those languages now have a constituency parser included in the default pipeline. You will probably want to turn that off and only do the processors you actually want

On Mon, Jun 19, 2023 at 1:21 PM Hugo Melo @.***> wrote:

It's being slower in general. We are using these 17 languages at the moment :/ English, Danish, Dutch, Italian, Polish, Portuguese, Spanish, Swedish, Simplified Chinese, French, German, Norwegian, Hindi, Finnish, Arabic, Russian, Turkish

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1259#issuecomment-1597717118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWMOWV747QA6NO4XMQ3XMCYEVANCNFSM6AAAAAAZMISEFQ . You are receiving this because you commented.Message ID: @.***>

AngledLuffa commented 1 year ago

In particular, you can do

processors="tokenize,pos,lemma,depparse,ner"

when creating the Pipeline, or something like that

On Mon, Jun 19, 2023 at 1:45 PM John Bauer @.***> wrote:

Several of those languages now have a constituency parser included in the default pipeline. You will probably want to turn that off and only do the processors you actually want

On Mon, Jun 19, 2023 at 1:21 PM Hugo Melo @.***> wrote:

It's being slower in general. We are using these 17 languages at the moment :/ English, Danish, Dutch, Italian, Polish, Portuguese, Spanish, Swedish, Simplified Chinese, French, German, Norwegian, Hindi, Finnish, Arabic, Russian, Turkish

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1259#issuecomment-1597717118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWMOWV747QA6NO4XMQ3XMCYEVANCNFSM6AAAAAAZMISEFQ . You are receiving this because you commented.Message ID: @.***>

hugomelo commented 1 year ago

We are hardcoding these processors processors="tokenize,mwt,lemma,pos,depparse" So, constituency parser is not the problem.

I have written a small script to compare the timing between the 2 versions analysing a file with 1000 lines of random text which is attached.

With stanza version 1.2.1 I got average of 1055 seconds and with version 1.5.0 I got 4500 seconds.

The script is below

#!/usr/bin/env python3

"""Docstring"""

import stanza

stanza.download('en', processors='tokenize,mwt,lemma,pos,depparse')
pipe = stanza.Pipeline('en', processors='tokenize,mwt,lemma,pos,depparse')

with open('comments.csv', encoding="utf-8") as file:
    lines = file.readlines()

for line in lines:
    pipe(line)

comments.csv

please, let me know if you need more information

AngledLuffa commented 1 year ago

That would indeed be a very concerning slowdown.

I ran a similar experiment locally, and found the following: parsing the Wikipedia page for the Philadelphia Flyers up to 1975 (f*** the Pens) 100 times in one call to the Pipeline took 16s for me with version 1.2.0, and 18s with the dev branch, which has no significant performance changes from 1.5.0. If I run it line by line, though, it takes 35s with version 1.2.0, and 51s with the latest. That's more noticeable than the batched version.

What I think is happening here is there are a couple different directions the runtime has gone in the past couple years:

1) We've made various performance improvements

2) We added the charlm to both the POS and depparse, which noticeably improves the accuracy of both

ultimately #2 is going to be a very expensive step IF you run the models one line at a time, as you are doing in this sample program. If you batch a whole bunch of sentences or documents together, it should be significantly faster. Are you able to do that in your application?

Here is what I ran:

import logging

logger = logging.getLogger('stanza')

import stanza
pipe = stanza.Pipeline("en", processors="tokenize,lemma,pos,depparse")

logger.info("Version %s", stanza.__version__)

text = open("foo.txt").read()
text = text * 100

logger.info("Starting processing")
doc = pipe(text)
logger.info("Ending processing")

text = text.split("\n")

logger.info("Staring line-by-line")
for line in text:
  doc = pipe(line)
logger.info("Ending line-by-line")

hugomelo commented 1 year ago

The environment I have is a webserver running with uwsgi and receiving comments like those in the file a attached. So, it seems much more like a line by line environment. I went testing the versions 1.3.0 and 1.4.0 and they are still fast. On versions 1.4.1 and 1.4.2 I noticed already a downgrade.

So, you don't expect to have any performance improvement for the new versions?

AngledLuffa commented 1 year ago

That tracks. The 1.4.1 release had charlm added to POS:

https://github.com/stanfordnlp/stanza/releases/tag/v1.4.1

The 1.5.0 release had the code changes necessary for charlm in depparse, but the models themselves were not ready at the time of release, so I hadn't mentioned it in the release notes. However, it was there, and we updated the models once they were ready.

I can check to see if there are ways of making the charlm less expensive for a single sentence, but I had considered a 10% overall slowdown on large batches to be worth the accuracy improvement.

On Wed, Jun 21, 2023 at 4:30 PM Hugo Melo @.***> wrote:

The environment I have is a webserver running with uwsgi and receiving comments like those in the file a attached. So, it seems much more like a line by line environment. I went testing the versions 1.3.0 and 1.4.0 and they are still fast. On versions 1.4.1 and 1.4.2 I noticed already a downgrade.

So, you don't expect to have any performance improvement for the new versions?

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1259#issuecomment-1601819514, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWOX6KC6RMSEQYIFQBDXMN7Y7ANCNFSM6AAAAAAZMISEFQ . You are receiving this because you commented.Message ID: @.***>

AngledLuffa commented 1 year ago

Thinking about this a tiny bit, the first thing that comes to mind is to try a lower dimension version of the charlm. I believe the main source of the runtime slowdown is not necessarily the charlm model itself, but the fact that it produces an embedding of dim 2000 whereas the original embedding is 100 or so. A smaller charlm dimension might get most of the benefit but not slow down subsequent operations as much.

The only caveat is it will take a couple weeks to try, partly because of upcoming deadline and partly because our cluster will be shut down for maintenance immediately after that deadline.

On Wed, Jun 21, 2023 at 5:06 PM John Bauer @.***> wrote:

That tracks. The 1.4.1 release had charlm added to POS:

https://github.com/stanfordnlp/stanza/releases/tag/v1.4.1

The 1.5.0 release had the code changes necessary for charlm in depparse, but the models themselves were not ready at the time of release, so I hadn't mentioned it in the release notes. However, it was there, and we updated the models once they were ready.

I can check to see if there are ways of making the charlm less expensive for a single sentence, but I had considered a 10% overall slowdown on large batches to be worth the accuracy improvement.

On Wed, Jun 21, 2023 at 4:30 PM Hugo Melo @.***> wrote:

The environment I have is a webserver running with uwsgi and receiving comments like those in the file a attached. So, it seems much more like a line by line environment. I went testing the versions 1.3.0 and 1.4.0 and they are still fast. On versions 1.4.1 and 1.4.2 I noticed already a downgrade.

So, you don't expect to have any performance improvement for the new versions?

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1259#issuecomment-1601819514, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWOX6KC6RMSEQYIFQBDXMN7Y7ANCNFSM6AAAAAAZMISEFQ . You are receiving this because you commented.Message ID: @.***>

AngledLuffa commented 1 year ago

Alright, I tried with French, and unfortunately it appears training a low dimension charlm is not the answer. In particular, on the FR-GSD dev set, I got:

no charlm: 97.89 1024d charlm: 98.17 100d charlm: 97.76 !!!

Another possibility which might be working better is a low rank approximation. I first project the charlm to 100 and then use it as normal. This is getting 98.04 on the dev set. I'll try something in between to see if that helps as well.

We might also be able to make up some or all of the gap by backproping into the charlm, which is not something we currently did for POS (it makes the model a bit bigger, but hopefully a trivial amount).

On Fri, Jun 23, 2023 at 1:04 AM John Bauer @.***> wrote:

Thinking about this a tiny bit, the first thing that comes to mind is to try a lower dimension version of the charlm. I believe the main source of the runtime slowdown is not necessarily the charlm model itself, but the fact that it produces an embedding of dim 2000 whereas the original embedding is 100 or so. A smaller charlm dimension might get most of the benefit but not slow down subsequent operations as much.

The only caveat is it will take a couple weeks to try, partly because of upcoming deadline and partly because our cluster will be shut down for maintenance immediately after that deadline.

On Wed, Jun 21, 2023 at 5:06 PM John Bauer @.***> wrote:

That tracks. The 1.4.1 release had charlm added to POS:

https://github.com/stanfordnlp/stanza/releases/tag/v1.4.1

The 1.5.0 release had the code changes necessary for charlm in depparse, but the models themselves were not ready at the time of release, so I hadn't mentioned it in the release notes. However, it was there, and we updated the models once they were ready.

I can check to see if there are ways of making the charlm less expensive for a single sentence, but I had considered a 10% overall slowdown on large batches to be worth the accuracy improvement.

On Wed, Jun 21, 2023 at 4:30 PM Hugo Melo @.***> wrote:

The environment I have is a webserver running with uwsgi and receiving comments like those in the file a attached. So, it seems much more like a line by line environment. I went testing the versions 1.3.0 and 1.4.0 and they are still fast. On versions 1.4.1 and 1.4.2 I noticed already a downgrade.

So, you don't expect to have any performance improvement for the new versions?

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1259#issuecomment-1601819514, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWOX6KC6RMSEQYIFQBDXMN7Y7ANCNFSM6AAAAAAZMISEFQ . You are receiving this because you commented.Message ID: @.***>

brauliobo commented 1 year ago

@AngledLuffa for future releases can you make an charlm option to disable it? it would be useful until performance is back so that we can upgrade only missing it

AngledLuffa commented 1 year ago

The problem is that means building two sets of models, one with and one without. It might be worthwhile. I was having a hard time finding a configuration that kept the speed without losing the improved accuracy

On Fri, Jul 14, 2023 at 3:12 PM Bráulio Bhavamitra @.***> wrote:

@AngledLuffa https://github.com/AngledLuffa for future releases can you make an charlm option to disable it? it would be useful until performance is back so that we can upgrade only missing it

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1259#issuecomment-1636501669, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWLC6DRKF6RRH2L5SILXQG75NANCNFSM6AAAAAAZMISEFQ . You are receiving this because you were mentioned.Message ID: @.***>

AngledLuffa commented 1 year ago

It might not be too onerous to do that if I script it correctly, and there's a noticeable difference without the charlm in speed (and accuracy, but that seems to be the tradeoff people want). I'll check with my PI that this seems reasonable. I don't think the extra effort will be too much, though.

On Fri, Jul 14, 2023 at 5:15 PM John Bauer @.***> wrote:

The problem is that means building two sets of models, one with and one without. It might be worthwhile. I was having a hard time finding a configuration that kept the speed without losing the improved accuracy

On Fri, Jul 14, 2023 at 3:12 PM Bráulio Bhavamitra < @.***> wrote:

@AngledLuffa https://github.com/AngledLuffa for future releases can you make an charlm option to disable it? it would be useful until performance is back so that we can upgrade only missing it

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1259#issuecomment-1636501669, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWLC6DRKF6RRH2L5SILXQG75NANCNFSM6AAAAAAZMISEFQ . You are receiving this because you were mentioned.Message ID: @.***>

AngledLuffa commented 1 year ago

(for reference, lower dimension charlm didn't work well enough, and projecting the charlm to a lower dimension wound up not being faster)

AngledLuffa commented 1 year ago

@hugomelo are you using gpu or cpu? One thing we found is that the dropoff is bigger for the charlm on cpu. That could certainly explain the much bigger loss of speed you're seeing.

I do think we'll go the route of having two sets of models, sometime in the next couple weeks.

AngledLuffa commented 1 year ago

I just released version 1.5.1, which has nocharlm versions of each of the models for which there are now charlm versions

Sometime in the coming month I will add the ability to specify a package name which collects all the nocharlm versions

AngledLuffa commented 1 year ago

If you install the dev branch, you can now do the following:

>>> import stanza
>>> pipe = stanza.Pipeline("en", package="default_fast")

I will make an official release out of this in the coming two weeks, depending on how many more issues I can resolve and/or features I can add.

hugomelo commented 1 year ago

@hugomelo are you using gpu or cpu? One thing we found is that the dropoff is bigger for the charlm on cpu. That could certainly explain the much bigger loss of speed you're seeing.

I do think we'll go the route of having two sets of models, sometime in the next couple weeks.

CPU :+1: Thanks for looking into it!

AngledLuffa commented 1 year ago

I believe this should be addressed by upgrading to 1.6.0 and using package="default_fast" when creating the Pipeline

stanfordnlp / stanza

Low performance after migrating to 1.5.0 #1259