gregwdata / cog-sqlcoder

cog files for deploying defog SQLCoder to Replicate
Apache License 2.0
7 stars 0 forks source link

Nvidia A100 (80GB) GPU #2

Closed nitrag closed 11 months ago

nitrag commented 11 months ago

Can you push a version/branch that uses a Nvidia A100 (80GB) GPU?

I'm getting a CUDA OOM error because my schema is large.

gregwdata commented 11 months ago

I just switched the settings for the existing version to run on A100-80GB. I don't see an obvious way to make it use different GPUs on different versions of the same model on Replicate. I think the only option might be to have a separate model ID.

Can you confirm if this change lets it run your longer prompts? (and how long is your long prompt?)

I'd rather not leave it in that state for too long. If it solves the problem for you, I can look at deploying an alternate model that runs on the larger GPU.

gregwdata commented 11 months ago

Also, I'll note that the SQLCoder repo does not specify a max context window, but since it is finetuned from StarCoder, I'm assuming it has the same 8192 token window.

nitrag commented 11 months ago

I'm confused. I am getting empty output. Before or after you changed to A100. I even just ran the example Web prompt without modifications. Can you get output? Maybe there is something wrong with my account. I have a credit card on file.

gregwdata commented 11 months ago

Just tried myself - I am getting good results with stream set to true (box checked in the UI), but similarly getting empty output with stream set to false (which was the default). Please try with stream = true.

The updates I pushed last night were to add the streaming output capability. Looks like I may have broken something on the non-streaming side in the process. I'll look into and fix that when I can.

nitrag commented 11 months ago

Ok yea, that's it. Thanks! No rush I'm just experimenting.

My uncommented SQL is 10000 characters. With comments is 21000 chars. 14 tables, 250 lines in file total. It's working on the A100.

I'm not sure the 8bit will work for me, it's hallucinating a bit. Maybe Replicate will host the full version publicly.