replit / ReplitLM

Inference code and configs for the ReplitLM model family
https://huggingface.co/replit
Apache License 2.0
925 stars 80 forks source link

where can it run? hardware specs, performance data #7

Closed elikoga closed 1 year ago

elikoga commented 1 year ago

Got it running on my laptop with a i5-1135G7, 16GB RAM and a RTX 3060 connected with Thunderbolt in an eGPU enclosure, running Windows 10.

It's loading the model after about 45s on my ssd disk.

When generating 100 tokens from class AVeryLongClass: (which is 8 tokens long), I'm getting numbers around ~8.47s for those tokens so about 11.8 tok/s. It gets slower with a bigger context of course. You can see my script here https://gist.github.com/elikoga/c300b9bf6b090fda9187644766347348

Just wanted to share some numbers and where I got it running :D I like the generation results I'm seeing so far

Maybe you can share some of your numbers too

madhavatreplit commented 1 year ago

Hey! Thank you for sharing your numbers!

Quick sanity checks for running on GPUs after looking at the script you shared here:

Our README in the Hub also has the exact snippets to use.

Keep me posted if this doesn't work. Happy to help you out!

elikoga commented 1 year ago

I'm on Windows so no Triton for me :(

Don't know how bfloat16 might affect performance but I can try later

madhavatreplit commented 1 year ago

That should help with inference.

Closing this issue. Let me know if you run into blockers or if I can help with anything else!