n4ze3m / dialoqbase

Create chatbots with ease
https://dialoqbase.n4ze3m.com/
MIT License
1.57k stars 261 forks source link

Github source crashes machine #276

Closed vladosam closed 1 month ago

vladosam commented 1 month ago

When I try to add bigger github repo like https://github.com/minio/docs my machine consumes all available memory. It's 8 cores, 48 GB virtual machine. I saw similar problem here and changed max-old-space-size in .env file to NODE_OPTIONS="--max-old-space-size=8192"; but nothing changed. I tried even bigger max-old-space-size with same result. I did succeed to add smaller repos 2-3 MB unpacked in upload folder, but anything bigger then 15 MB crashes machine. Screenshot from 2024-07-09 08-14-31 Screenshot from 2024-07-09 08-16-07 Screenshot from 2024-07-09 08-15-34

redis                | 1:C 09 Jul 2024 06:07:35.568 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis                | 1:C 09 Jul 2024 06:07:35.568 * Redis version=7.2.5, bits=64, commit=00000000, modified=0, pid=1, just started
redis                | 1:C 09 Jul 2024 06:07:35.568 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis                | 1:M 09 Jul 2024 06:07:35.568 * monotonic clock: POSIX clock_gettime
redis                | 1:M 09 Jul 2024 06:07:35.569 * Running mode=standalone, port=6379.
redis                | 1:M 09 Jul 2024 06:07:35.569 * Server initialized
redis                | 1:M 09 Jul 2024 06:07:35.569 * Loading RDB produced by version 7.2.5
redis                | 1:M 09 Jul 2024 06:07:35.569 * RDB age 6 seconds
redis                | 1:M 09 Jul 2024 06:07:35.569 * RDB memory usage when created 1.58 Mb
redis                | 1:M 09 Jul 2024 06:07:35.569 * Done loading RDB, keys loaded: 13, keys expired: 0.
redis                | 1:M 09 Jul 2024 06:07:35.569 * DB loaded from disk: 0.000 seconds
redis                | 1:M 09 Jul 2024 06:07:35.569 * Ready to accept connections tcp
dialoqbase-postgres  | 
dialoqbase-postgres  | PostgreSQL Database directory appears to contain a database; Skipping initialization
dialoqbase-postgres  | 
dialoqbase-postgres  | 2024-07-09 06:07:35.718 UTC [1] LOG:  starting PostgreSQL 15.4 (Debian 15.4-2.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
dialoqbase-postgres  | 2024-07-09 06:07:35.719 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
dialoqbase-postgres  | 2024-07-09 06:07:35.719 UTC [1] LOG:  listening on IPv6 address "::", port 5432
dialoqbase-postgres  | 2024-07-09 06:07:35.720 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
dialoqbase-postgres  | 2024-07-09 06:07:35.727 UTC [29] LOG:  database system was shut down at 2024-07-09 06:07:29 UTC
dialoqbase-postgres  | 2024-07-09 06:07:35.738 UTC [1] LOG:  database system is ready to accept connections
dialoqbase           | yarn run v1.22.19
dialoqbase           | $ npx prisma migrate deploy && npx prisma db seed && fastify start app.js
dialoqbase           | Prisma schema loaded from prisma/schema.prisma
dialoqbase           | Datasource "db": PostgreSQL database "dialoqbase", schema "public" at "dialoqbase-pg:5432"
dialoqbase-postgres  | 2024-07-09 06:07:38.030 UTC [33] LOG:  could not receive data from client: Connection reset by peer
dialoqbase           | 
dialoqbase           | 36 migrations found in prisma/migrations
dialoqbase           | 
dialoqbase           | 
dialoqbase           | No pending migrations to apply.
dialoqbase           | npm notice
dialoqbase           | npm notice New minor version of npm available! 10.7.0 -> 10.8.1
dialoqbase           | npm notice Changelog: https://github.com/npm/cli/releases/tag/v10.8.1
dialoqbase           | npm notice To update run: npm install -g npm@10.8.1
dialoqbase           | npm notice
dialoqbase           | Running seed command `ts-node prisma/seed.ts` ...
dialoqbase           | Seeding new models...
dialoqbase           | 
dialoqbase           | 🌱  The seed command has been executed.
dialoqbase           | ┌─────────────────────────────────────────────────────────┐
dialoqbase           | │  Update available 5.15.1 -> 5.16.1                      │
dialoqbase           | │  Run the following to update                            │
dialoqbase           | │    yarn add --dev prisma@latest                         │
dialoqbase           | │    yarn add @prisma/client@latest                       │
dialoqbase           | └─────────────────────────────────────────────────────────┘
dialoqbase           | [info] use ffmpeg.wasm v0.12.0
dialoqbase           | Connecting to database...
dialoqbase           | [info] use ffmpeg.wasm v0.12.0
dialoqbase           | Processing queue
dialoqbase           | Cloning into './uploads/minio-docs-main'...
dialoqbase-postgres  | 2024-07-09 06:12:35.825 UTC [27] LOG:  checkpoint starting: time
dialoqbase-postgres  | 2024-07-09 06:12:36.433 UTC [27] LOG:  checkpoint complete: wrote 8 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.604 s, sync=0.002 s, total=0.609 s; sync files=8, longest=0.001 s, average=0.001 s; distance=35 kB, estimate=35 kB
dialoqbase           | 
dialoqbase           | <--- Last few GCs --->
dialoqbase           | 
dialoqbase           | [167:0x72148b0]   289975 ms: Scavenge 7884.4 (8208.2) -> 7868.5 (8199.0) MB, 16.2 / 0.0 ms  (average mu = 0.238, current mu = 0.170) allocation failure; 
dialoqbase           | [167:0x72148b0]   291664 ms: Mark-sweep 7892.2 (8214.1) -> 7868.4 (8193.1) MB, 1592.6 / 0.0 ms  (average mu = 0.208, current mu = 0.175) allocation failure; scavenge might not succeed
dialoqbase           | [167:0x72148b0]   291794 ms: Scavenge 7892.2 (8208.3) -> 7876.5 (8199.0) MB, 13.6 / 0.0 ms  (average mu = 0.208, current mu = 0.175) allocation failure; 
dialoqbase           | 
dialoqbase           | 
dialoqbase           | <--- JS stacktrace --->
dialoqbase           | 
dialoqbase           | FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
dialoqbase           |  1: 0xb9c310 node::Abort() [/usr/local/bin/node]
dialoqbase           |  2: 0xaa27ee  [/usr/local/bin/node]
dialoqbase           |  3: 0xd73eb0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
dialoqbase           |  4: 0xd74257 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
dialoqbase           |  5: 0xf515d5  [/usr/local/bin/node]
dialoqbase           |  6: 0xf524d8 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/usr/local/bin/node]
dialoqbase           |  7: 0xf629d3  [/usr/local/bin/node]
dialoqbase           |  8: 0xf63848 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
dialoqbase           |  9: 0xf3e19e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
dialoqbase           | 10: 0xf3f567 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
dialoqbase           | 11: 0xf1fae0 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/usr/local/bin/node]
dialoqbase           | 12: 0xf170ac v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [/usr/local/bin/node]
dialoqbase           | 13: 0xf17225 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [/usr/local/bin/node]
dialoqbase           | 14: 0x11d215e v8::internal::MaybeHandle<v8::internal::OrderedHashMap> v8::internal::OrderedHashTable<v8::internal::OrderedHashMap, 2>::Allocate<v8::internal::Isolate>(v8::internal::Isolate*, int, v8::internal::AllocationType) [/usr/local/bin/node]
dialoqbase           | 15: 0x11d2213 v8::internal::MaybeHandle<v8::internal::OrderedHashMap> v8::internal::OrderedHashTable<v8::internal::OrderedHashMap, 2>::Rehash<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::OrderedHashMap>, int) [/usr/local/bin/node]
dialoqbase           | 16: 0x12dc55d v8::internal::Runtime_MapGrow(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
dialoqbase           | 17: 0x17125f9  [/usr/local/bin/node]
n4ze3m commented 1 month ago

Hey, thank you for reporting this. I will check what is happening with the GitHub loader

n4ze3m commented 1 month ago

I have released a new version of the GitHub loader which can handle larger repositories. However, I do not recommend using al-minlm-l6-v2 for it. Instead, try using Nomic or MxBai via Ollama for local embedding. Please let me know if you still face any errors.

vladosam commented 1 month ago

I test it already. :D But I test it with al-minlm-l6-v2 and it happened again, all available memory is used and all free space on hdd. I will try with ollama embedding. Thanks.

vladosam commented 1 month ago

Success, now it works with Ollama Nomic embedding. It was super slow, over 3 hours to process github repo. But now it works with github sources. Thanks.

n4ze3m commented 1 month ago

Thanks for reporting back. I will add a concurrency feature to avoid this slow embedding for larger data sources.