[context] add support for `--additional_context` option for `query` subcommand

signebedi commented 1 year ago

We should add an --additional_context option to the query subcommand that expects either a string or file path, which it will bolt onto the request string (where? at the start? at the end?).

This allows users to extend their questions with details that might not typically be formatted like a question, like stack traces with line breaks, etc.

We should then write some logic to structure this context (removing line breaks, etc) and applying length limits and/or keyword tokenization to it before bolting it onto the request string.

signebedi commented 1 year ago

When submitting ChatCompletions, this can be stored (after being deformatted with \n and other problem characters removed) under the 'system' tag of the message prompt.

With standard completions, we can just prepend prompts with whatever amount of the context that our token count will permit... OR, we can include it in the tokenized body of text...

Either way, we will need to add unittests for this.

signebedi commented 1 year ago

When we bootstrap additional_context, I think we need to privilege the other sources of context first. So, the order of precedence is (from highest to lowest):

question
gptty context (eg. past questions and responses)
additional context (eg. the additional user provided context)

We have three cases:

ChatCompletions
Completions with keyword tokenization #25
Completions without keyword tokenization

In the cases of ChatCompletions, we should prepend the context whatever number of tokens we can add before reading our max token count, after having added all other context to the context as a "system" dictionary, see #31.

In the case of Completions with keyword tokenization, we should pass the additional context to return_most_common_phrases by prepending it the text parameter passed therein. Therefore, we do not seriously risk going over our token count.

In the case of Completions without keyword tokenization, we should prepend the context string with whatever number of tokens we can add before reading our max token count, after having added all other context to the context as a standard string.

signebedi commented 1 year ago

[tests] test additional_context passed to get_context Now that we've added support for additional_context in gptty.context:get_context(), we should add some tests where we bootstrap additional context to the three different cases and validate the structure of the returned context, as well as its length.

signebedi / gptty

[context] add support for `--additional_context` option for `query` subcommand #37