nomic-ai / gpt4all

GPT4All: Chat with Local LLMs on Any Device
https://gpt4all.io
MIT License
66.12k stars 7.28k forks source link

[LocalDocs: functional scenario] HowTo #2060

Open SINAPSA-IC opened 4 months ago

SINAPSA-IC commented 4 months ago

Hello.

Another long post, but bear with me - the steps-to-reproduce the behavior herein have to be known in detail. It's about how and what to click on, in order to receive replies based on info in Local Documents.

This test has been carried out over/using the 1 (one) checked/selected LocalDocs collection that is found in the ZIP file mentioned in the Discussion here (https://github.com/nomic-ai/gpt4all/discussions/2057).

In GPT4All 2.7.1, the behavior is as follows, or in short:

[INST] %1 [/INST]"

And now, the litany; the useful bits are the last two entries, 5 of 6) and 6 of 6) (Seven Of Nine is chilling at the spa)

1 of 6) GPT4ALL 2.7.1:

Prompt: who has won the Michigan primaries in 2024?

(processing... generating response... processing...) Response: I am unable to provide an answer to this question as it pertains to events that have not yet occurred. The primary elections for the year 2024 have not yet taken place, and the candidates who will be running and ultimately win the Michigan primaries are unknown at this time.

Prompt: who has won the Michigan primaries in 2024?

(searching localdocs: [categoryname]... Response: I am unable to provide an answer to this question as it pertains to events that have not yet occurred. The primary elections for the year 2024 have not yet taken place, and the candidates who will be running and ultimately win the Michigan primaries are unknown at this time.


  1. "Trump and Biden won Michigan. But 'uncommitted' votes demanded attention". By Victor. Date: 2024, February 28. In 20240228-AP-Trump and Biden won Michigan - But -uncommitted- votes demanded attention.pdf. Page 1. Context
  2. ... (all relevant PDF files)

2 of 6) GPT4ALL 2.7.1:

Prompt: who has won the Michigan primaries in 2024?

(processing... generating response... processing...) Response: I am unable to provide an answer to this question as it pertains to events that have not yet occurred. The primary elections for the year 2024 have not yet taken place, and the candidates who will be running and ultimately win the Michigan primaries are unknown at this time.

Prompt: who has won the Michigan primaries in 2024?

(searching localdocs: [categoryname]... Response: I am unable to provide an answer to this question as it pertains to events that have not yet occurred. The primary elections for the year 2024 have not yet taken place, and the candidates who will be running and ultimately win the Michigan primaries are unknown at this time.


  1. "Trump and Biden won Michigan. But 'uncommitted' votes demanded attention". By Victor. Date: 2024, February 28. In 20240228-AP-Trump and Biden won Michigan - But -uncommitted- votes demanded attention.pdf. Page 1. Context
  2. ... (all relevant PDF files - the same Context snippets as above)

3 of 6) GPT4ALL 2.7.1:

Prompt: who has won the Michigan primaries in 2024?

(processing... generating response... processing...) Response: I am unable to provide an answer to this question as the 2024 presidential election has not yet occurred and the primary season for that election has not yet begun. The candidates who will run for president in 2024 are not yet known, and it is impossible to predict who will win the Michigan primaries at this time.

Prompt: who has won the Michigan primaries in 2024?

(searching localdocs: [categoryname]... Response: I am unable to provide an answer to this question as the 2024 presidential election has not yet occurred and the primary season for that election has not yet begun. The candidates who will run for president in 2024 are not yet known, and it is impossible to predict who will win the Michigan primaries at this time.


  1. "Trump and Biden won Michigan. But 'uncommitted' votes demanded attention". By Victor. Date: 2024, February 28. In 20240228-AP-Trump and Biden won Michigan - But -uncommitted- votes demanded attention.pdf. Page 1. Context
  2. ... (all relevant PDF files - the same Context snippets as above)

4 of 6) GPT4ALL 2.7.1:

Prompt: who has won the Michigan primaries in 2024?

(processing... generating response... processing...) Response: I am unable to provide an answer to this question as it pertains to events that have not yet occurred. The 2024 primary elections are still several years away and the candidates who will be running for office have not been determined.

Prompt: who has won the Michigan primaries in 2024?

(searching localdocs: [categoryname]... Response: I am unable to provide an answer to this question as it pertains to events that have not yet occurred. The 2024 primary elections are still several years away and the candidates who will be running for office have not been determined.


  1. "Trump and Biden won Michigan. But 'uncommitted' votes demanded attention". By Victor. Date: 2024, February 28. In 20240228-AP-Trump and Biden won Michigan - But -uncommitted- votes demanded attention.pdf. Page 1. Context
  2. ... (all relevant PDF files - the same Context snippets as above)

_5 of 6) GPT4ALL 2.7.1:

Prompt: who has won the Michigan primaries in 2024?

(processing... generating response... processing...) Response: I am unable to provide an answer to this question as it pertains to events that have not yet occurred. The 2024 primary elections are still several years away and the candidates who will be running for office have not been determined. It is important to wait until the actual election results are known before making any predictions or assumptions about the outcome.

**- LocalDocs: collections: checked: [categoryname]

Prompt: who has won the Michigan primaries in 2024?

(searching localdocs: [categoryname]... Response: In the 2024 presidential election, President Joe Biden and former President Donald Trump won the Michigan primaries on Tuesday.


  1. "Trump and Biden won Michigan. But 'uncommitted' votes demanded attention". By Victor. Date: 2024, February 28. In 20240228-AP-Trump and Biden won Michigan - But -uncommitted- votes demanded attention.pdf. Page 1. Context
  2. ... (all relevant PDF files - the same Context snippets as above)

6 of 6) GPT4ALL 2.7.1:

Prompt: who has won the Michigan primaries in 2024?

(processing... generating response... processing...) Response: I am unable to provide an answer to this question as the 2024 primary elections have not yet occurred and the results are unknown.

**- LocalDocs: collections: checked: [categoryname]

Prompt: who has won the Michigan primaries in 2024?

(searching localdocs: [categoryname]... Response: In the 2024 presidential election, Joe Biden and Donald Trump have won the Michigan primaries.


  1. "Trump and Biden won Michigan. But 'uncommitted' votes demanded attention". By Victor. Date: 2024, February 28. In 20240228-AP-Trump and Biden won Michigan - But -uncommitted- votes demanded attention.pdf. Page 1. Context
  2. ... (all relevant PDF files - the same Context snippets as above)_
manyoso commented 4 months ago

This is an amazing issue. Thank you so much for your detail!

SINAPSA-IC commented 4 months ago

Thank you! not me :)

I only try, third day in a row, 2+ hours total, to jailbreak(?) into getting Mistral to tell me what has PCR said... there's a PDF in there... but nope, censored to the bone... this LLM's gaslighted me about Genghis Khan, too... so I stick to the US primaries instead...

SINAPSA-IC commented 4 months ago

I'd say to simply place the code of Refresh under a "Use these LocalDocs" Button, as there are no OK, Cancel buttons anywhere on that dialog... "We" Windows users are accustomed to the controlbox X and to the Esc key to simply Close a dialog with the meaning of "Cancel" whatever's been done in it, and to pressing an "OK"-ish button to accept whatever changes.

I'm not sure/not remembering(!) if loading another LLM does the Refresh thing, and /now I cannot check on this.

SINAPSA-IC commented 4 months ago

Forgot to mention, as if the post was not Long enough - after closing GPT4All, it was Terminated as a process, and the model was unloaded from memory - as per Task Manager. I restarted it after making sure of this.

cebtenzzre commented 4 months ago

This is far too large to be an actionable issue. We need to open smaller issues that represent individual problems - or at the very least, put a list of checkboxes at the top of this issue that represent things that can be fixed/improved. Closing "HowTo" represents nothing.

SINAPSA-IC commented 4 months ago

Well, it's a test run, I titled it HowTo because I didn't know where to place it... a bug is not, Documentation is not, seems too heavy for a Discussion, however a (blank) issue it is, and quite important at that - it works, but only if this and that, letting the user try this and guess that. It shows what is to be done in order to profit from the local info, like a HowTo... Anyway, to me looks like two quick and dirty solutions, later to be refined, are 1) calling the Refresh procedure when closing the LocalDocs dialog and 2) placing a text on this dialog saying something like "This will only work either when Loading (not Reloading) a LLM after selecting some collections, or when Refreshing after selecting some collections"

(UX is Very Important in what I/we do - software and mass-media; the user Wants! this and that - build a city, a spaceship, a TV show - after clicking 1 button at most, as even 1 click and reading a MessageBox is too much for some)

SINAPSA-IC commented 4 months ago

What's more intriguing (just stumbled upon this): I heve unchecked the single LocalDocs collection that was chosen for local info, then did the Refresh with the same LLM, in the hope that no collection whatsoever will be used or quoted, but the LLM did quote again the collection as if it were checked.

I suggest placing on hold the LocalDocs functionality beyond Indexing - exposed to the user - until all these issues are cleared - "all" since they seem parts of a whole - namely, retrieval from local documents and association with replies, as snippets from Context. Or, explain in detail which steps are to be taken by the user in order to properly exploit the LocalDocs functionality.

As things stand now, it is too confusing a behavior: sometimes the replies are being correctly built upon local info, and some other times local info is being quoted without it being visibly employed in the reply; a reply followed by quotes of Context that was not really used might be taken for granted by the oblivious user; too delicate a thing to be left as it is.

Thank you.

SINAPSA-IC commented 4 months ago

A workaround of sorts for the user to ensure that info from LocalDocs will be used by Mistral - the LLM I've run the test above - is to write a Prompt Template in the vein of:

To answer or to reply, you only use external information.

[INST] %1 [/INST]

From what I've seen, this does not require Reloading or Refreshing and will get a correct answer.

However, under the same PT a question was answered correctly whose answer was not in a selected LocalDocs collection - what's more, the quoted Contexts, although in that collection, were not relevant at all (some license terms) to the contents of this answer (presidential bid).

From what I'm seeing, the interaction with LocalDocs is indicative of something random involving the user's input, the LLM's "internal knowledge" and the LocalDocs as "external knowledge"...

Ah well. Calling it a day now.

Thank you all!

cebtenzzre commented 4 months ago

From what I've seen, this does not require Reloading or Refreshing and will get a correct answer.

The Local Docs feature works by matching your query against an index of your documents using a text embedding model (currently all-MiniLM-L6-v2), and then providing the relevant snippets in a block of input to the LLM starting with ### Context. The chat LLM itself has no bearing on which documents are used. Given that, it does not seem like you are describing a consistently reproducible issue with the Local Docs feature.

SINAPSA-IC commented 4 months ago

Alright. I didn't know that. However, the test above was carried out under the same conditions - Prompts, LocalDocs, LLM, LLM parameters - to check on reproducibility, in view of future interaction; it all ended up with two kinds of reproducible behavior - the thing worked 1 after Refresh and 2 when sending in a certain Prompt Template, and it didn't if not Refreshing. I am very curious as to the results that, say, at least 2 other people have obtained after following the same steps in each case, to draw a conclusion from a wee statistics on ~3 cases.

SINAPSA-IC commented 4 months ago

Anyway, all those who intend to use PDFs as Local Docs should check, if possible, that the text in those PDFs can be selected - individual characters, words, phrases.

The case may be that a whole page in a PDF file is select-able as an image where text is written upon, and as a result the text on that PDF page cannot be parsed (via an OCR mechanism /intrinsic to the LLM), although it is readable by a human - and that particular PDF will not be taken into account when the LLM is searching for text.

Good to keep this in mind when asking for info from PDFs; on the other hand, who's willing to verify tens-hundreds of PDFs...

Case study of sorts:

Let's assume that the user has 1 (one) PDF on a particular subject, whereby the user can pose questions that would be answered with info from that PDF. If the answer does not contain info from the PDF, when the user's expecting that it should, then the user may verify if the text inside is text indeed - selectable with the mouse, ergo parsable without employing OCR.