segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
624 stars 36 forks source link

Canine model and High VRAM usage #115

Open Qubitium opened 5 months ago

Qubitium commented 5 months ago

@bminixhofer We are observing very high vram usage with canine model even though the wtp-canine-s-12l-no-adapters fp32 weights are only about 515MB so we naively expected batch=1 in fp16 mode to use 207.5MB of ram for weights plus runtime/inference costs. We didn't expect batch=1 vram to be 1.3GB. Input text is around 230kb text file.

Is this a bug or architecture norm for the canine model? If norm, is there anything that we can do to reduce the memory footprint? Thanks.

wtp = WtP("wtp-canine-s-12l-no-adapters")
wtp.half().to(device="cuda")
batch vram GB
1 1.309
2 1.335
4 1.385
6 1.428
8 1.487
10 1.542
12 1.583
14 1.639
16 1.688
32 2.094
bminixhofer commented 4 months ago

Hi, thanks for these benchmarks! And sorry for being slow to respond.

You could debug this by checking how much memory the vanilla CANINE (https://huggingface.co/google/canine-s) takes for a forward pass vs. a forward pass of the WtP model (see e.g. here: https://github.com/bminixhofer/wtpsplit/?tab=readme-ov-file#advanced-usage).

If there's a discrepancy there I'll investigate it. It's possible that CANINE just needs a lot of memory though, I am not super happy with that architecture and will upgrade the models to a different arch soon(ish).

Qubitium commented 4 months ago

Will do. Btw, if you need gpu compute to train the next model, I can provide you with a A100 80+G. You can ping me up on Twitter at qbitium.

bminixhofer commented 4 months ago

Thanks! And that's very generous, deferring to @markus583 since he is doing the training but we are using TPUs so there is probably no need.

markus583 commented 4 months ago

Very generous indeed! Thanks but the TPUs are very strong. I'd be very curious whether there is a discrepancy too.