henk717 / KoboldAI

KoboldAI is generative AI software optimized for fictional use, but capable of much more!
http://koboldai.com
GNU Affero General Public License v3.0
368 stars 130 forks source link

Added a backend model for API-based usage. [Included WIP support for OpenRouter & Neuro.Mancer] #511

Closed Iceblade02 closed 1 month ago

Iceblade02 commented 6 months ago

The code should not affect anything outside its scope, but should definitely be glanced at by someone more experienced than me.

henk717 commented 6 months ago

Selecting it in the menu I get KeyError: 'OpenRouter' so something's not right. On further inspection the OpenRouter file can't load itself properly: ModuleNotFoundError: No module named 'modeling.inference_models.openrouter.class'; 'modeling.inference_models.openrouter' is not a package

For now since you made the files Openrouter specific its also better to put them all in the subfolder, we can alter this later when we add modern OpenAI support based off this.

Iceblade02 commented 6 months ago

Oh, hang on, I managed to get in a previous WIP file, there should only be the "openrouter_handler" and "openrouter/class.py", "openrouter.py" needs to be removed

henk717 commented 6 months ago

If you update your branch this PR will update automatically.

Iceblade02 commented 6 months ago

There, should be updated now.

henk717 commented 6 months ago

https://github.com/scott-ca/KoboldAI-united is the incorrect header, that is a very outdated fork not associated with us.

henk717 commented 6 months ago

When trying to submit my key it automatically removes the key and I end up with a blank (Same key works in Lite)

Iceblade02 commented 6 months ago

Alright, I've done somewhat more thorough testing this time around, and it seems to be behaving properly now, at least on my end.

Iceblade02 commented 6 months ago

I've reworked the handler class in /modeling/inference_models to be more generic, and as a proof-of-concept added support for another API! Adding support for Neuro.Mancer took roughly an hour.

Iceblade02 commented 6 months ago

Moved GooseAI over to the new API handler.

henk717 commented 6 months ago

GooseAI may be using the old implementation of OpenAI's API. Will have to be tested for. The OpenAI API endpoint should be migrated though with the ability for custom URL's, this will allow all these newer hosting company's to work.

Iceblade02 commented 6 months ago

Currently, api_handler translates all calls to core_generate, raw_generate & _raw_generate into a call to the abstract function _raw_api_generate (unique to each specific endpoint handler), decoding the prompt if it has been tokenized, then passing it as plaintext (along with other params unchanged). As long as _raw_api_generate returns a list of strings [{"result A"}{"result B"} etc.] it will work fine, tokenizing the outputs, standardizing the lengths and returning a GenerationResult.

api_handler also provides some default functions to make api requests: (batch_api_call -> [json, json, etc], api_call -> json) taking in a call var which should contain the url, json and headers (specified in the _raw_api_generate for each endpoint handler).

Things TODO in api_handler:

Iceblade02 commented 6 months ago

NOTE: I keep triggering the following error when tabbing into the web client, and sometimes when switching between tabs in the left menu.

image

I don't know if it is a bug I've introduced, or if it comes from elsewhere. It feels like the latter, given that it triggers even without a model chosen, but idk.

EDIT: This seems to have been resolved

henk717 commented 6 months ago

GooseAI apparently revoked the testing credit, so their backend I will no longer be able to test and have to assume works. Without testing credit it will move towards a best effort basis (Keep in mind their site is not compatible with the modern OpenAI standard to my knowledge).

I tried openrouter and this worked, but the presency penalty / frequency pentalty is shown on the loading screen and thats not confortmant with our standard. We should map the frequency penalty to the repetition penalty setting inside KoboldAI so that it can be modified without having to reload the model.

Iceblade02 commented 5 months ago

Repetition penalty is now properly tied to the setting inside the KAI front end.

Removing presence penalty & frequency penalty would require adding them as new items in gensettings.py (and maybe elsewhere?).

Do we want to add them there?

One potential downside is we'll have a whole bunch of settings with different names that aren't actually used in a lot of models. Ideally, we'd "ask" the backend which settings it actually uses/are implemented, and only present those to the user.

Iceblade02 commented 5 months ago

Added them and it seems to work. Easy to undo if we want,