Open nikhilweee opened 1 year ago
I think I figured this out. The following article was causing the error. Everything else works fine if I omit this article.
{
"index": 5982,
"title": "Mali",
"text": "Mali (Bambara: ߡߊߟߌ, Fula: 𞤃𞤢𞥄𞤤𞤭, ), officially the Republic of Mali ..."
}
Perhaps because the article contains ADLaM and N'Ko characters?
Use the following
docker-compose.yml
to spin up weaviate and contextionary.
```yml # docker-compose.yml --- version: "3.4" services: weaviate: command: - --host - 0.0.0.0 - --port - "8080" - --scheme - http image: semitechnologies/weaviate:1.20.3 ports: - 8080:8080 restart: on-failure:0 environment: QUERY_DEFAULTS_LIMIT: 25 CONTEXTIONARY_URL: contextionary:9999 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true" PERSISTENCE_DATA_PATH: "/var/lib/weaviate" DEFAULT_VECTORIZER_MODULE: "text2vec-contextionary" ENABLE_MODULES: "text2vec-contextionary" CLUSTER_HOSTNAME: "node1" contextionary: environment: OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75 EXTENSIONS_STORAGE_MODE: weaviate EXTENSIONS_STORAGE_ORIGIN: http://weaviate:8080 NEIGHBOR_OCCURRENCE_IGNORE_PERCENTILE: 5 ENABLE_COMPOUND_SPLITTING: "false" image: semitechnologies/contextionary:en0.16.0-v1.0.2 ports: - 9999:9999 ```docker-compose.yml
Run the following python script, trying to import 6000 articles at once.
This gives an error. There is no error if you use
df.head(5120)
instead.