vitrivr / vitrivr-engine

vitrivr's next-generation retrieval engine. It is capable of extracting and retrieving a wider range of multimedia objects such as audio, video, images or 3d models.
https://vitrivr.org
MIT License
5 stars 3 forks source link

[BUG] JVM Fatal Error During Video Ingestion on Cottontail Backend (141 videos consistently) #115

Open flurinB opened 1 day ago

flurinB commented 1 day ago

Description

When ingesting videos into the nmr-backend (cottontail version), the JVM runs into a fatal error after a certain amount of videos. In the tests it always was 141 videos, while we always used the same videos in the same order: JVM_Error

In the postgresql version of the backend, the problem seemed to get solved by using the cashedContentFactory and / or ingesting the videos in chunks (although it proceeds to crash later on). This does not seem to work in the cottontail version. The following plots show the JVM memory over time up until the crash happened. The values were optained using the "jstat -gc $PID" command, where the PID is the process id of the JVM process.

CCSC: Compressed class space capacity (kB)

CCSC_over_time

CCSU: Compressed class space used (kB).

CCSU_over_time

?

CGC_over_time

?

CGCT_over_time

EC: Current eden space capacity (kB).

EC_over_time

EU: Eden space utilization (kB).

EU_over_time

FGC: Number of full GC events.

FGC_over_time

FGCT: Full garbage collection time.

FGCT_over_time

GCT: Total garbage collection time.

GCT_over_time

MC: Metaspace capacity (kB).

MC_over_time

MU: Metacspace utilization (kB).

MU_over_time

OC: Current old space capacity (kB).

OC_over_time

OU: Old space utilization (kB).

OU_over_time

S0C: Current survivor space 0 capacity (kB).

S0C_over_time

S0U: Survivor space 0 utilization (kB).

S0U_over_time

S1C: Current survivor space 1 capacity (kB).

S1C_over_time

S1U: Survivor space 1 utilization (kB).

S1U_over_time

YGC: Number of young generation GC events.

YGC_over_time

YGCT: Young generation garbage collection time.

YGCT_over_time

The ingested videos are part of the V3C collection, speciffically these files (in this order): <11565.mp4,09363.mp4,15238.mp4,14120.mp4,05462.mp4,08443.mp4,08269.mp4,06631.mp4,10837.mp4,13252.mp4,17130.mp4,11891.mp4,15743.mp4,12064.mp4,07480.mp4,11825.mp4,05907.mp4,05988.mp4,07252.mp4,16087.mp4,12552.mp4,10767.mp4,04343.mp4,07037.mp4,01392.mp4,11592.mp4,03316.mp4,12106.mp4,15498.mp4,06179.mp4,09130.mp4,03390.mp4,00836.mp4,14499.mp4,17023.mp4,05372.mp4,14579.mp4,15898.mp4,05898.mp4,12263.mp4,04095.mp4,04695.mp4,02452.mp4,04685.mp4,16447.mp4,07044.mp4,02060.mp4,04563.mp4,15830.mp4,01280.mp4,08548.mp4,16064.mp4,10831.mp4,03546.mp4,10318.mp4,13986.mp4,07931.mp4,09304.mp4,07826.mp4,03190.mp4,11586.mp4,14190.mp4,13795.mp4,06869.mp4,17112.mp4,16242.mp4,05553.mp4,08447.mp4,03812.mp4,02957.mp4,12889.mp4,08455.mp4,11470.mp4,06624.mp4,04770.mp4,12460.mp4,14029.mp4,13065.mp4,01798.mp4,06834.mp4,05387.mp4,15484.mp4,12234.mp4,16542.mp4,12901.mp4,02194.mp4,10575.mp4,01687.mp4,04970.mp4,08655.mp4,10439.mp4,15720.mp4,05576.mp4,11562.mp4,17161.mp4,10699.mp4,06145.mp4,00219.mp4,04103.mp4,00186.mp4,13011.mp4,00176.mp4,03531.mp4,11608.mp4,01217.mp4,05944.mp4,05746.mp4,13845.mp4,00872.mp4,07471.mp4,14858.mp4,07676.mp4,13286.mp4,08709.mp4,14191.mp4,16627.mp4,00067.mp4,09122.mp4,00509.mp4,14162.mp4,13614.mp4,13301.mp4,03833.mp4,10498.mp4,09216.mp4,16135.mp4,04261.mp4,02349.mp4,07220.mp4,17181.mp4,05684.mp4,09131.mp4,11283.mp4,12974.mp4,06233.mp4,08984.mp4,01479.mp4,13209.mp4,08411.mp4,00066.mp4,10889.mp4>

Following are the pipeline-config files: IMAGE.json MESH.json VIDEO.json

As well as the Config files within the other config files within the backend (would all be .kt, but is not supported in a github issue): APIConfig.txt Config.txt MinioConfig.txt

net-cscience-raphael commented 5 hours ago

Please specify the video collection used and provide the configuration of the schema and pipeline.