Closed Lanceeeelot closed 3 months ago
You way want to try the CachedContentFactory
, cf. https://github.com/vitrivr/vitrivr-engine/wiki/Documentation#content-factory
Other than that, I guess @net-cscience-raphael could possibly investigate issues related to the MemoryControlledFileSystemEnumerator
Can you provide me for further investigations:
If there is some error in the pipeline, which results in not free all memory, this is expected behavior.
So I have run a few experiments of my own by extracting a video collection using two features (averagecolor and clip). The pipeline is pretty straightforward without any special cases. It branches-off after decoding and extracts features in parallel.
{
"schema": "vitrivr",
"context": {
"contentFactory": "InMemoryContentFactory",
"resolverName": "disk",
"local": {
"enumerator": {
"path": "/Volumes/VBS24/html/media/V3C",
"depth": "1"
},
"decoder": {
"timeWindowMs": "1000"
},
"thumbs": {
"maxSideResolution": "500",
"mimeType": "JPG"
},
"filter": {
"type": "SOURCE:VIDEO"
}
}
},
"operators": {
"enumerator": {
"type": "ENUMERATOR",
"factory": "FileSystemEnumerator",
"mediaTypes": [
"VIDEO"
]
},
"decoder": {
"type": "DECODER",
"factory": "VideoDecoder"
},
"selector": {
"type": "TRANSFORMER",
"factory": "MiddleContentAggregator"
},
"avgColor": {
"type": "EXTRACTOR",
"fieldName": "averagecolor"
},
"clip": {
"type": "EXTRACTOR",
"fieldName": "clip"
},
"file_metadata": {
"type": "EXTRACTOR",
"fieldName": "file"
},
"time_metadata": {
"type": "EXTRACTOR",
"fieldName": "time"
},
"video_metadata": {
"type": "EXTRACTOR",
"fieldName": "video"
},
"thumbs": {
"type": "EXPORTER",
"exporterName": "thumbnail"
},
"filter": {
"type": "TRANSFORMER",
"factory": "TypeFilterTransformer"
}
},
"operations": {
"enumerator": {
"operator": "enumerator"
},
"decoder": {
"operator": "decoder",
"inputs": [
"enumerator"
]
},
"selector": {
"operator": "selector",
"inputs": [
"decoder"
]
},
"averagecolor": {
"operator": "avgColor",
"inputs": [
"selector"
]
},
"clip": {
"operator": "clip",
"inputs": [
"selector"
]
},
"thumbnails": {
"operator": "thumbs",
"inputs": [
"selector"
]
},
"time_metadata": {
"operator": "time_metadata",
"inputs": [
"selector"
]
},
"filter": {
"operator": "filter",
"inputs": [
"averagecolor",
"clip",
"thumbnails",
"time_metadata"
],
"merge": "COMBINE"
},
"video_metadata": {
"operator": "video_metadata",
"inputs": [
"filter"
]
},
"file_metadata": {
"operator": "file_metadata",
"inputs": [
"video_metadata"
]
}
},
"output": [
"file_metadata"
]
}
Here are the key insights: Fundamentally, I don't think that there is a memory leak or memory allocation problem in vitrivr-engine
. At least I couldn't spot one. However, one must be conscious about how the pipeline works and what the consequences of certain pipeline design decisions are.
InMemoryImageContent
(the frames) and FloatVectorDescriptor
(the extracted features). These objects require a lot of space.Retrievable
reaches the PersistingSink
. In a typical video extraction scenario, that's the case when a single video has been processed completely.This basic behaviour is illustrated by the graphics.
Now there are several knobs to tune the memory consumption of the extraction pipeline:
CachedImageContent
instead of InMemoryContent
. These will swap data to disk, if memory pressure builds.VideoDecoder
this can be adjusted using the timeWindowMs
parameter, which guides the time covered by a single Retrievable
. A higher value will lead to fewer Retrievable
s, covering larger portions of the videos. Consequently, fewer features are being generated.Retrievable
. Instead of keeping all the frames of a retrievable around (and in memory), just keep the one required using FirstContentAggreagtor
, LastContentAggregator
or MiddleContentAggreagtor
(or some implementation of yours).That being said: In order to be able to debug your issue, we really need your extraction pipeline configuration.
One additional comment just to illustrate my point: If I remove the MiddleContentAggreagtor
from above configuration, memory starts to become an issue as well. Because for every second of video, 25 frames are kept in memory and features are extracted for all the 25 frames. This leads to 50 Descriptor
and 25 ContentElement
per Retrievable
, which are kept around in memory until the entire video has been processed.
The video is 5mins and 8GB are not enough to handle this. But this is not an application issue. It's instructing the engine to do something it does not have the resources for.
Thanks for the great insight, tests and tips. This is the old pipeline config i used (similar to the one used in Example): video-pipeline.json Here is a small sample of our dataset with our shortest (3:33 min) and longest (18:43 min) video: example_dataset link
I'm going to start a new extraction now and try different tweaks, like using the CachedContentFactory
, MiddleContentAggregator
or adjusting the timeWindowms
. Afterwards i will share my results here.
My Apologies for the late Answer. I only work one day a week on this project.
Hey! Any update on this?
I played with your configuration myself and noticed, that the extraction definition is incorrect. Here is a better version, which I successfully used to extract all three videos using 12GB of RAM
{
"schema": "vitrivr",
"context": {
"contentFactory": "CachedContentFactory",
"resolverName":"disk",
"local": {
"enumerator": {
"path": "/Users/rgasser/Downloads/example_dataset",
"depth": "3",
"skip": "0",
"limit": "20"
},
"decoder": {
"timeWindowMs": "30_000"
},
"filter": {
"type": "SOURCE:VIDEO"
}
}
},
"operators": {
"enumerator": {
"type": "ENUMERATOR",
"factory": "FileSystemEnumerator",
"mediaTypes": ["VIDEO"]
},
"decoder": {
"type": "DECODER",
"factory": "VideoDecoder"
},
"selector": {
"type": "TRANSFORMER",
"factory": "LastContentAggregator"
},
"averagecolor": {
"type": "EXTRACTOR",
"fieldName": "averagecolor"
},
"clip": {
"type": "EXTRACTOR",
"fieldName": "clip"
},
"dino": {
"type": "EXTRACTOR",
"fieldName": "dino"
},
"whisper": {
"type": "EXTRACTOR",
"fieldName": "whisper"
},
"ocr": {
"type": "EXTRACTOR",
"fieldName": "ocr"
},
"meta-file": {
"type": "EXTRACTOR",
"fieldName": "file"
},
"meta-video": {
"type": "EXTRACTOR",
"fieldName": "video"
},
"meta-time": {
"type": "EXTRACTOR",
"fieldName": "time"
},
"thumbnail": {
"type": "EXPORTER",
"exporterName": "thumbnail"
},
"filter": {
"type": "TRANSFORMER",
"factory": "TypeFilterTransformer"
}
},
"operations": {
"stage-0-0": {"operator": "enumerator"},
"stage-1-0": {"operator": "decoder","inputs": ["stage-0-0"]},
"stage-2-0": {"operator": "selector","inputs": ["stage-1-0"]},
"stage-3-0": {"operator": "clip","inputs": ["stage-2-0"]},
"stage-3-1": {"operator": "dino","inputs": ["stage-2-0"]},
"stage-3-2": {"operator": "ocr","inputs": ["stage-2-0"]},
"stage-3-3": {"operator": "averagecolor","inputs": ["stage-2-0"]},
"stage-3-4": {"operator": "thumbnail","inputs": ["stage-2-0"]},
"stage-3-5": {"operator": "meta-time","inputs": ["stage-2-0"]},
"stage-3-6": {"operator": "whisper","inputs": ["stage-2-0"]},
"stage-4-0": {"operator": "filter", "inputs": ["stage-3-6", "stage-3-5", "stage-3-4", "stage-3-3", "stage-3-2", "stage-3-1", "stage-3-0"], "merge": "COMBINE"},
"stage-5-0": {"operator": "meta-file", "inputs": ["stage-4-0"]},
"stage-6-0": {"operator": "meta-video", "inputs": ["stage-5-0"]}
},
"output": [
"stage-6-0"
]
}
I tried different configs aswell and was able to extract all videos by increasing the timeWindowms
.
The pc im using got 64GB of RAM, so that should not be a problem. I will do more tests tomorrow and try your config on the whole dataset of 19 Videos. Then update here and most likely close this issue. Thanks for the help and guidance.
Thanks to your configuration i was able to fully extract all videos without any error. The process created 19.379 thumbnails, before it would stop around 9000 (with the same timeWindowms
). While testing different configurations i didn't came across any other mayor Issues or notable takeaways for this Issue.
Could you highlight which part of the extraction definition was incorrect before closing this issue?
Glad to hear.
Well incorrect is a bit of a misnomer. Let's say "not ideal". Fundamentally, there are two (somehow contradictory) paradigms that are used during extraction and that one needs to be aware of:
Retrievable
are objects that describe part of a media file (e.g., a segment or the entire file itself). They contain all the Descriptor
s and potentially Relationship
s to other Retrievable
s. Typcially, Retrievable
are shared between operators, that is, different operators may see and edit the same instance at different points in time.Retrievable
s (that can branch and merge). That stream can be shaped using certain operators (e.g., filters) That is, the entire object graph generated during an extraction is kept in memory . Every Retrievable
that reaches the end is persisted with ALL its relationships, descriptors etc.
Since the extraction process for certain media types define explicit relationships between Retrievable
s (e.g., video segment to video file), one might end-up in the situation where a Retrievable
can be persisted twice because of relationships between them. Hence, when designing the pipeline, one must shape the stream such that only the desired Retrievable
s make it to the end. In case of a video, it makes sense to persist on a per-file basis.
When designing a pipeline, one should therefore try to think in terms of inputs and outputs:
VideoDecoder
generates one Retrievable
per temporal segment. Those contain all the ImageContent
and AudioContent
. Once a video file has been processed completely, the VideoDecoder
emits one Retrievable
for the file (without any content). This Retrievable
holds a relationship to all its temporal segments.averagecolor
, clip
, dino
, ocr
and whisper
only operate on retrievable with Content
.meta-file
and meta-video
only process this last (per-file) Retrievable
emitted at the end.Retrievable
to reach the end. Since it holds references to all the segments, the entire graph will be persisted exactly once.Therefore, the following setup makes sense in this case:
averagecolor
, clip
, dino
, ocr
and whisper
right after the aggregation. These operations can go in parallel, which is why they share a single source. Retrievable
. In this step we also aggregate the different branches with the COMBINE
logic, which makes sure, that a Retrievable
is only emitted downstream, once it has been received on all the inputs.meta-file
and meta-video
come after this filter step, since they're anyway only interested in the per-file Retrievable
. One could parallelise these, but it's hardly worth it given that these features are very low-effort to generate.Retrievable
sI hope this makes sense.
Thanks for the detailed explanation! The pipeline is a lot clearer to me now. I removed my example dataset, because i'm not able to share it permanently.
I get a OutOfMemory Error while trying to extract multiple videos. Their duration range from 3 minutes to 17 minutes. I also tryd the extraction with the "MemoryControlledFileSystemEnumerator". The following Screenshot shows the original error with the "FileSystemEnumerator":
When i use the MemoryControlledEnumerator the extraction still stops around the same time (same number of thumbnails), but shows the following Log:
I also checked the memory limit of my fes docker container(and cottontailDB), but that looks like it has plenty of space left: