When ollama is running on CPU, granite models fail to work from Continue, but CLI is OK

fbricon commented 2 weeks ago

As reported by @msivasubramaniaan and @sbouchet, neither Continue chat nor completion works, even with 3b models, on Lenovo laptops where ollama can't use the GPU. Tab completion failing because of memory issue could be understandable, as ollama CPU demands more memory, but Chat working from CLI and not from Continue is weird.

See upstream report https://github.com/continuedev/continue/issues/2838

sbouchet commented 1 day ago

below my current status. SPOILER it's starting to get better.

my config is : Lenovo Laptop with no GPU ( got a big WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode. when installing Ollama. 64Gb RAM ollama 0.4.2 granite3-dense:2b model Continue extension v0.9.233 (pre-release) Paver extension v0.1.2024111933 (pre-release)

config.json:

{
  "models": [
    {
      "model": "granite3-dense:2b",
      "provider": "ollama",
      "contextLength": 54000,
      "systemMessage": "You are Granite Code, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup, etc) with \"Hello! I am Granite Code, created by IBM. How can I help you today?\". Please do not say anything else and do not start a conversation.",
      "title": "granite3-dense:2b"
    }
  ],
  "tabAutocompleteModel": {
    "model": "granite3-dense:2b",
    "provider": "ollama",
    "contextLength": 54000,
    "systemMessage": "You are Granite Code, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You always respond to greetings (for example, hi, hello, g'day, morning, afternoon, evening, night, what's up, nice to meet you, sup, etc) with \"Hello! I am Granite Code, created by IBM. How can I help you today?\". Please do not say anything else and do not start a conversation.",
    "title": "granite3-dense:2b"
  },
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "contextProviders": [
    {
      "name": "diff",
      "params": {}
    },
    {
      "name": "folder",
      "params": {}
    },
    {
      "name": "codebase",
      "params": {}
    }
  ],
  "slashCommands": [
    {
      "name": "edit",
      "description": "Edit selected code"
    },
    {
      "name": "comment",
      "description": "Write comments for the selected code"
    },
    {
      "name": "share",
      "description": "Export the current chat session to markdown"
    },
    {
      "name": "commit",
      "description": "Generate a git commit message"
    }
  ],
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text:latest",
    "title": "nomic-embed-text:latest"
  }
}

sbouchet commented 1 day ago

chat results are faster, but completion is stills very slow and often don't propose anything. the things gets weird when i look at the Continue LLM output from VSCode, where all the completion are empty:

==========================================================================
==========================================================================
##### Completion options #####
{
  "contextLength": 54000,
  "model": "granite3-dense:2b",
  "maxTokens": 4096,
  "temperature": 0.01,
  "stop": [
    "<fim_prefix>",
    "<fim_suffix>",
    "<fim_middle>",
    "<file_sep>",
    "<|endoftext|>",
    "</fim_middle>",
    "</code>",
    "/src/",
    "#- coding: utf-8",
    "```"
  ],
  "raw": true
}

##### Prompt #####
<fim_prefix>            convert.make_csv()
            convert.save_csv(self.output_path + 'raw/' + file_name.split('.')[0] + '.csv')
            print('[raw convertion]: [%s] done' % (file_name.split('.')[0] + '.csv'))

    def combine_csv(self):
        print('[category list]:', self.category_list)
        for category in self.category_list:
            df = pd.read_csv(self.output_path + 'raw/' + category + '.csv', header = None)
            df.rename(columns = {0: 'Event', 1: 'ST', 2: 'ET', 3: 'Remark', 4: 'Location'}, inplace = True)

            df['ST'] = pd.to_datetime(df['ST'], utc = True, format='mixed')
            df['ET'] = pd.to_datetime(df['ET'], utc = True, format='mixed')
            # print(df.dtypes)
            df['ST'] = df['ST'].dt.tz_convert(self.tz)
            df['ET'] = df['ET'].dt.tz_convert(self.tz)   
            # print(df.head())

            # Comment these 3 lines if you don't need tags
            df['Event'] = df['Event'].astype(str).str.lower()
            df['Remark'] = df['Remark'].astype(str).str.lower()
            df['Location'] = df['Location'].astype(str).str.lower()

            df['Category'] = [category for i in range(len(df))]
            df['Duration'] = (df['ET'] - df['ST']).astype('timedelta64[ns]') / 60

            self.csv_df = pd.concat([self.csv_df, df], ignore_index=True)
            print('[combine]: category [%s] done' % category)

        self.csv_df.sort_values(by = ['ST'], inplace = True, ignore_index = True)
        print('[sort]: done')

        # print(self.csv_df.head())
        # print(self.csv_df.info())

    def tagging(self):
        event_list = list(self.csv_df['Event'])
        tag_list = [[] for i in range(len(event_list))]
        # print(len(event_list), event_list[:5])

        for i in range(len(event_list)):
            for keywords, tag in self.tag_dict.items():
                if isinstance(keywords, str):
                    if event_list[i].find(keywords) != -1:
                        tag_list[i].append(tag)
                elif sum([(event_list[i].find(k) != -1) for k in keywords]):
                    tag_list[i].append(tag)

        tags = [' '.join(tag_list[i]) for i in range(len(event_list))]
        self.csv_df['Tag'] = tags
        print('[tagging]: done')

    def output(self, output_file = 'output.csv', start_date = '2000-01-01', end_date = '2100-01-01'):
        print('[output]: creating csv from [%s]' % start_date + 'to [%s].' % end_date)
        <fim_suffix>
        bound = pd.to_datetime(pd.Series([start_date, end_date])).dt.tz_localize(self.tz)
        output_df = self.csv_df[(self.csv_df['ST'] >= bound[0]) & (self.csv_df['ET'] < bound[1])]
        # print(output_df.head())
        with open(self.output_path + output_file, 'w') as f:
            f.write(output_df.to_csv(index = False))
        print('[output]: done')

if __name__ == '__main__':<fim_middle>==========================================================================
==========================================================================
Completion:

this is even the same result for the only place in the code that currently return a completion result ( above screenshot )

sbouchet commented 1 day ago

on ollama side :

[GIN] 2024/11/21 - 11:52:55 | 200 |  964.397854ms |       127.0.0.1 | POST     "/api/generate"
time=2024-11-21T11:52:55.354+01:00 level=ERROR source=runner.go:642 msg="Failed to acquire semaphore" error="context canceled"
time=2024-11-21T11:52:55.470+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-21T11:52:55.471+01:00 level=DEBUG source=routes.go:270 msg="generate request" images=0 prompt="<fim_prefix>            convert.make_csv()\n            convert.save_csv(self.output_path + 'raw/' + file_name.split('.')[0] + '.csv')\n            print('[raw convertion]: [%s] done' % (file_name.split('.')[0] + '.csv'))\n        \n    def combine_csv(self):\n        print('[category list]:', self.category_list)\n        for category in self.category_list:\n            df = pd.read_csv(self.output_path + 'raw/' + category + '.csv', header = None)\n            df.rename(columns = {0: 'Event', 1: 'ST', 2: 'ET', 3: 'Remark', 4: 'Location'}, inplace = True)\n\n            df['ST'] = pd.to_datetime(df['ST'], utc = True, format='mixed')\n            df['ET'] = pd.to_datetime(df['ET'], utc = True, format='mixed')\n            # print(df.dtypes)\n            df['ST'] = df['ST'].dt.tz_convert(self.tz)\n            df['ET'] = df['ET'].dt.tz_convert(self.tz)   \n            # print(df.head())\n            \n            # Comment these 3 lines if you don't need tags\n            df['Event'] = df['Event'].astype(str).str.lower()\n            df['Remark'] = df['Remark'].astype(str).str.lower()\n            df['Location'] = df['Location'].astype(str).str.lower()\n            \n            df['Category'] = [category for i in range(len(df))]\n            df['Duration'] = (df['ET'] - df['ST']).astype('timedelta64[ns]') / 60\n            \n            self.csv_df = pd.concat([self.csv_df, df], ignore_index=True)\n            print('[combine]: category [%s] done' % category)\n        \n        self.csv_df.sort_values(by = ['ST'], inplace = True, ignore_index = True)\n        print('[sort]: done')\n        \n        # print(self.csv_df.head())\n        # print(self.csv_df.info())\n\n    def tagging(self):\n        event_list = list(self.csv_df['Event'])\n        tag_list = [[] for i in range(len(event_list))]\n        # print(len(event_list), event_list[:5])\n\n        for i in range(len(event_list)):\n            for keywords, tag in self.tag_dict.items():\n                if isinstance(keywords, str):\n                    if event_list[i].find(keywords) != -1:\n                        tag_list[i].append(tag)\n                elif sum([(event_list[i].find(k) != -1) for k in keywords]):\n                    tag_list[i].append(tag)\n\n        tags = [' '.join(tag_list[i]) for i in range(len(event_list))]\n        self.csv_df['Tag'] = tags\n        print('[tagging]: done')\n        \n\n    def output(self, output_file = 'output.csv', start_date = '2000-01-01', end_date = '2100-01-01'):\n        print('[output]: creating csv from [%s]' % start_date + 'to [%s].' % end_date)\n        <fim_suffix>\n        bound = pd.to_datetime(pd.Series([start_date, end_date])).dt.tz_localize(self.tz)\n        output_df = self.csv_df[(self.csv_df['ST'] >= bound[0]) & (self.csv_df['ET'] < bound[1])]\n        # print(output_df.head())\n        with open(self.output_path + output_file, 'w') as f:\n            f.write(output_df.to_csv(index = False))\n        print('[output]: done')\n        \n\n\nif __name__ == '__main__':<fim_middle>"
time=2024-11-21T11:55:04.878+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-21T11:55:04.878+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=30m0s
time=2024-11-21T11:55:04.878+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
[GIN] 2024/11/21 - 11:55:04 | 200 |          2m9s |       127.0.0.1 | POST     "/api/generate"
time=2024-11-21T11:55:05.004+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-21T11:55:05.005+01:00 level=DEBUG source=routes.go:270 msg="generate request" images=0 prompt="<fim_prefix>            df.rename(columns = {0: 'Event', 1: 'ST', 2: 'ET', 3: 'Remark', 4: 'Location'}, inplace = True)\n\n            df['ST'] = pd.to_datetime(df['ST'], utc = True, format='mixed')\n            df['ET'] = pd.to_datetime(df['ET'], utc = True, format='mixed')\n            # print(df.dtypes)\n            df['ST'] = df['ST'].dt.tz_convert(self.tz)\n            df['ET'] = df['ET'].dt.tz_convert(self.tz)   \n            # print(df.head())\n            \n            # Comment these 3 lines if you don't need tags\n            df['Event'] = df['Event'].astype(str).str.lower()\n            df['Remark'] = df['Remark'].astype(str).str.lower()\n            df['Location'] = df['Location'].astype(str).str.lower()\n            \n            df['Category'] = [category for i in range(len(df))]\n            df['Duration'] = (df['ET'] - df['ST']).astype('timedelta64[ns]') / 60\n            \n            self.csv_df = pd.concat([self.csv_df, df], ignore_index=True)\n            print('[combine]: category [%s] done' % category)\n        \n        self.csv_df.sort_values(by = ['ST'], inplace = True, ignore_index = True)\n        print('[sort]: done')\n        \n        # print(self.csv_df.head())\n        # print(self.csv_df.info())\n\n    def tagging(self):\n        event_list = list(self.csv_df['Event'])\n        tag_list = [[] for i in range(len(event_list))]\n        # print(len(event_list), event_list[:5])\n\n        for i in range(len(event_list)):\n            for keywords, tag in self.tag_dict.items():\n                if isinstance(keywords, str):\n                    if event_list[i].find(keywords) != -1:\n                        tag_list[i].append(tag)\n                elif sum([(event_list[i].find(k) != -1) for k in keywords]):\n                    tag_list[i].append(tag)\n\n        tags = [' '.join(tag_list[i]) for i in range(len(event_list))]\n        self.csv_df['Tag'] = tags\n        print('[tagging]: done')\n        \n\n    def output(self, output_file = 'output.csv', start_date = '2000-01-01', end_date = '2100-01-01'):\n        print('[output]: creating csv from [%s]' % start_date + 'to [%s].' % end_date)\n        \n        bound = pd.to_datetime(pd.Series([start_date, end_date])).dt.tz_localize(self.tz)\n        output_df = self.csv_df[(self.csv_df['ST'] >= bound[0]) & (self.csv_df['ET'] < bound[1])]\n        # print(output_df.head())\n        with open(self.output_path + output_file, 'w') as f:\n            f.write(output_df.to_csv(index = False))\n        print('[output]: done')\n        a<fim_suffix>\n\n\nif __name__ == '__main__':\n    p = Parser(INPUT_PATH, OUTPUT_PATH, TIME_ZONE, TAG_DICT)\n    p.tagging()\n    p.output(start_date = '2024-05-01', end_date = '2024-10-22')\n    \n<fim_middle>"
time=2024-11-21T11:55:05.012+01:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=870 prompt=827 used=1 remaining=826
time=2024-11-21T11:55:05.556+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-21T11:55:05.556+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=30m0s
time=2024-11-21T11:55:05.556+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
[GIN] 2024/11/21 - 11:55:05 | 200 |  562.716442ms |       127.0.0.1 | POST     "/api/generate"
time=2024-11-21T11:55:05.633+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-21T11:55:05.634+01:00 level=DEBUG source=routes.go:270 msg="generate request" images=0 prompt="<fim_prefix>            df.rename(columns = {0: 'Event', 1: 'ST', 2: 'ET', 3: 'Remark', 4: 'Location'}, inplace = True)\n\n            df['ST'] = pd.to_datetime(df['ST'], utc = True, format='mixed')\n            df['ET'] = pd.to_datetime(df['ET'], utc = True, format='mixed')\n            # print(df.dtypes)\n            df['ST'] = df['ST'].dt.tz_convert(self.tz)\n            df['ET'] = df['ET'].dt.tz_convert(self.tz)   \n            # print(df.head())\n            \n            # Comment these 3 lines if you don't need tags\n            df['Event'] = df['Event'].astype(str).str.lower()\n            df['Remark'] = df['Remark'].astype(str).str.lower()\n            df['Location'] = df['Location'].astype(str).str.lower()\n            \n            df['Category'] = [category for i in range(len(df))]\n            df['Duration'] = (df['ET'] - df['ST']).astype('timedelta64[ns]') / 60\n            \n            self.csv_df = pd.concat([self.csv_df, df], ignore_index=True)\n            print('[combine]: category [%s] done' % category)\n        \n        self.csv_df.sort_values(by = ['ST'], inplace = True, ignore_index = True)\n        print('[sort]: done')\n        \n        # print(self.csv_df.head())\n        # print(self.csv_df.info())\n\n    def tagging(self):\n        event_list = list(self.csv_df['Event'])\n        tag_list = [[] for i in range(len(event_list))]\n        # print(len(event_list), event_list[:5])\n\n        for i in range(len(event_list)):\n            for keywords, tag in self.tag_dict.items():\n                if isinstance(keywords, str):\n                    if event_list[i].find(keywords) != -1:\n                        tag_list[i].append(tag)\n                elif sum([(event_list[i].find(k) != -1) for k in keywords]):\n                    tag_list[i].append(tag)\n\n        tags = [' '.join(tag_list[i]) for i in range(len(event_list))]\n        self.csv_df['Tag'] = tags\n        print('[tagging]: done')\n        \n\n    def output(self, output_file = 'output.csv', start_date = '2000-01-01', end_date = '2100-01-01'):\n        print('[output]: creating csv from [%s]' % start_date + 'to [%s].' % end_date)\n        \n        bound = pd.to_datetime(pd.Series([start_date, end_date])).dt.tz_localize(self.tz)\n        output_df = self.csv_df[(self.csv_df['ST'] >= bound[0]) & (self.csv_df['ET'] < bound[1])]\n        # print(output_df.head())\n        with open(self.output_path + output_file, 'w') as f:\n            f.write(output_df.to_csv(index = False))\n        print('[output]: done')\n        and<fim_suffix>\n\n\nif __name__ == '__main__':\n    p = Parser(INPUT_PATH, OUTPUT_PATH, TIME_ZONE, TAG_DICT)\n    p.tagging()\n    p.output(start_date = '2024-05-01', end_date = '2024-10-22')\n    \n<fim_middle>"
time=2024-11-21T11:57:39.101+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-21T11:57:39.101+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=30m0s
time=2024-11-21T11:57:39.101+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0
[GIN] 2024/11/21 - 11:57:39 | 200 |         2m33s |       127.0.0.1 | POST     "/api/generate"
time=2024-11-21T11:57:39.189+01:00 level=DEBUG source=sched.go:575 msg="evaluating already loaded" model=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce
time=2024-11-21T11:57:39.189+01:00 level=DEBUG source=routes.go:270 msg="generate request" images=0 prompt="<fim_prefix>            df.rename(columns = {0: 'Event', 1: 'ST', 2: 'ET', 3: 'Remark', 4: 'Location'}, inplace = True)\n\n            df['ST'] = pd.to_datetime(df['ST'], utc = True, format='mixed')\n            df['ET'] = pd.to_datetime(df['ET'], utc = True, format='mixed')\n            # print(df.dtypes)\n            df['ST'] = df['ST'].dt.tz_convert(self.tz)\n            df['ET'] = df['ET'].dt.tz_convert(self.tz)   \n            # print(df.head())\n            \n            # Comment these 3 lines if you don't need tags\n            df['Event'] = df['Event'].astype(str).str.lower()\n            df['Remark'] = df['Remark'].astype(str).str.lower()\n            df['Location'] = df['Location'].astype(str).str.lower()\n            \n            df['Category'] = [category for i in range(len(df))]\n            df['Duration'] = (df['ET'] - df['ST']).astype('timedelta64[ns]') / 60\n            \n            self.csv_df = pd.concat([self.csv_df, df], ignore_index=True)\n            print('[combine]: category [%s] done' % category)\n        \n        self.csv_df.sort_values(by = ['ST'], inplace = True, ignore_index = True)\n        print('[sort]: done')\n        \n        # print(self.csv_df.head())\n        # print(self.csv_df.info())\n\n    def tagging(self):\n        event_list = list(self.csv_df['Event'])\n        tag_list = [[] for i in range(len(event_list))]\n        # print(len(event_list), event_list[:5])\n\n        for i in range(len(event_list)):\n            for keywords, tag in self.tag_dict.items():\n                if isinstance(keywords, str):\n                    if event_list[i].find(keywords) != -1:\n                        tag_list[i].append(tag)\n                elif sum([(event_list[i].find(k) != -1) for k in keywords]):\n                    tag_list[i].append(tag)\n\n        tags = [' '.join(tag_list[i]) for i in range(len(event_list))]\n        self.csv_df['Tag'] = tags\n        print('[tagging]: done')\n        \n\n    def output(self, output_file = 'output.csv', start_date = '2000-01-01', end_date = '2100-01-01'):\n        print('[output]: creating csv from [%s]' % start_date + 'to [%s].' % end_date)\n        \n        bound = pd.to_datetime(pd.Series([start_date, end_date])).dt.tz_localize(self.tz)\n        output_df = self.csv_df[(self.csv_df['ST'] >= bound[0]) & (self.csv_df['ET'] < bound[1])]\n        # print(output_df.head())\n        with open(self.output_path + output_file, 'w') as f:\n            f.write(output_df.to_csv(index = False))\n        print('[output]: done')\n        <fim_suffix>\n\n\nif __name__ == '__main__':\n    p = Parser(INPUT_PATH, OUTPUT_PATH, TIME_ZONE, TAG_DICT)\n    p.tagging()\n    p.output(start_date = '2024-05-01', end_date = '2024-10-22')\n    \n<fim_middle>"
time=2024-11-21T11:57:39.195+01:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=829 prompt=826 used=748 remaining=78
[GIN] 2024/11/21 - 11:57:42 | 200 |  2.843133165s |       127.0.0.1 | POST     "/api/generate"
time=2024-11-21T11:57:42.023+01:00 level=DEBUG source=sched.go:407 msg="context for request finished"
time=2024-11-21T11:57:42.023+01:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce duration=30m0s
time=2024-11-21T11:57:42.023+01:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/home/sbouchet/.ollama/models/blobs/sha256-63dd4fe4571a2fa1521a6127be533abf85dfc76a653572ee629c1de9fdd794ce refCount=0

fbricon commented 1 day ago

Use 4096 context length for granite3-dense models, with maxTokens = 2000. Just Rerun the Paver: ... Granite... command so they're set automatically (make sure you're using the latest Continue and Paver pre-relreases)

redhat-developer / vscode-paver

When ollama is running on CPU, granite models fail to work from Continue, but CLI is OK #129