weaviate / semantic-search-through-wikipedia-with-weaviate

Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine
MIT License
241 stars 21 forks source link

How to use only retrival component of semantic search (Information extraction) #2

Closed theainerd closed 2 years ago

theainerd commented 2 years ago

Hello i want to use this semantic search engine. I just want to retrive all documents rather performing question answering on it. How to get the data of information retrival part using python client ?

bobvanluijt commented 2 years ago

You can!

1. Remove the Q&A module

2. Query using nearText

The nearText function does a semantic search without the Q&A module.

For example:

{
  Get {
    Paragraph(
      nearText: {
        concepts: ["Italian food"]
      }
      limit: 50
    ) {
      content
      order
      title
      inArticle {
        ... on Article {
          title
        }
      }
    }
  }
}

If you like, you can join our Slack to further discuss this.

theainerd commented 2 years ago

Hello how exactly to use it using the python client, also i am trying to run on a cpu.

bobvanluijt commented 2 years ago

Hi @theainerd –

how exactly to use it using the python client

You can connect the client to a Weaviate instance as outlined here in the docs. The above query will look something like:

import weaviate

client = weaviate.Client("http://localhost:8080")

nearText = {
  "concepts": ["Italian food"]
}

client.query.get("Paragraph", ["content", "order", "title", "_additional {certainty} "]).with_near_text(nearText).limit(50).do()

also i am trying to run on a CPU

This will work but slower and without the Q&A module, the docker file is here

theainerd commented 2 years ago

Strange but all i get is this error when i try to run.

ConnectionResetError: [Errno 104] Connection reset by peer

bobvanluijt commented 2 years ago

Hi @theainerd – just to make sure, if you go to your browser and load: http://localhost:8080/v1/meta you can see the instance?

theainerd commented 2 years ago

I am running it on my gcp instance. I am trying to acess it : http://<ip-of-the-machine>/v1/meta I am unable to access the instance. I am not sure what i am doing wrong.

Is there any further details i can share for further understanding of the problem.

bobvanluijt commented 2 years ago

I think it's because of the port number. Would you mind moving this conversation to our Slack? If you are on it just go to the channel #datasets