vitrivr / vitrivr-engine

vitrivr's next-generation retrieval engine. It is capable of extracting and retrieving a wider range of multimedia objects such as audio, video, images or 3d models.
https://vitrivr.org
MIT License
5 stars 3 forks source link

OutOfMemory Error #86

Closed Lanceeeelot closed 3 months ago

Lanceeeelot commented 4 months ago

I get a OutOfMemory Error while trying to extract multiple videos. Their duration range from 3 minutes to 17 minutes. I also tryd the extraction with the "MemoryControlledFileSystemEnumerator". The following Screenshot shows the original error with the "FileSystemEnumerator":

Bildschirmfoto vom 2024-07-11 10-09-11

When i use the MemoryControlledEnumerator the extraction still stops around the same time (same number of thumbnails), but shows the following Log:

Bildschirmfoto vom 2024-07-11 14-16-54

I also checked the memory limit of my fes docker container(and cottontailDB), but that looks like it has plenty of space left: Bildschirmfoto vom 2024-07-11 14-42-17

sauterl commented 4 months ago

You way want to try the CachedContentFactory, cf. https://github.com/vitrivr/vitrivr-engine/wiki/Documentation#content-factory Other than that, I guess @net-cscience-raphael could possibly investigate issues related to the MemoryControlledFileSystemEnumerator

net-cscience-raphael commented 4 months ago

Can you provide me for further investigations:

  1. Your pipeline configuration
  2. A example Dataset

If there is some error in the pipeline, which results in not free all memory, this is expected behavior.

ppanopticon commented 3 months ago

So I have run a few experiments of my own by extracting a video collection using two features (averagecolor and clip). The pipeline is pretty straightforward without any special cases. It branches-off after decoding and extracts features in parallel.

{
  "schema": "vitrivr",
  "context": {
    "contentFactory": "InMemoryContentFactory",
    "resolverName": "disk",
    "local": {
      "enumerator": {
        "path": "/Volumes/VBS24/html/media/V3C",
        "depth": "1"
      },
      "decoder": {
        "timeWindowMs": "1000"
      },
      "thumbs": {
        "maxSideResolution": "500",
        "mimeType": "JPG"
      },
      "filter": {
        "type": "SOURCE:VIDEO"
      }
    }
  },
  "operators": {
    "enumerator": {
      "type": "ENUMERATOR",
      "factory": "FileSystemEnumerator",
      "mediaTypes": [
        "VIDEO"
      ]
    },
    "decoder": {
      "type": "DECODER",
      "factory": "VideoDecoder"
    },
    "selector": {
      "type": "TRANSFORMER",
      "factory": "MiddleContentAggregator"
    },
    "avgColor": {
      "type": "EXTRACTOR",
      "fieldName": "averagecolor"
    },
    "clip": {
      "type": "EXTRACTOR",
      "fieldName": "clip"
    },
    "file_metadata": {
      "type": "EXTRACTOR",
      "fieldName": "file"
    },
    "time_metadata": {
      "type": "EXTRACTOR",
      "fieldName": "time"
    },
    "video_metadata": {
      "type": "EXTRACTOR",
      "fieldName": "video"
    },
    "thumbs": {
      "type": "EXPORTER",
      "exporterName": "thumbnail"
    },
    "filter": {
      "type": "TRANSFORMER",
      "factory": "TypeFilterTransformer"
    }
  },
  "operations": {
    "enumerator": {
      "operator": "enumerator"
    },
    "decoder": {
      "operator": "decoder",
      "inputs": [
        "enumerator"
      ]
    },
    "selector": {
      "operator": "selector",
      "inputs": [
        "decoder"
      ]
    },
    "averagecolor": {
      "operator": "avgColor",
      "inputs": [
        "selector"
      ]
    },
    "clip": {
      "operator": "clip",
      "inputs": [
        "selector"
      ]
    },
    "thumbnails": {
      "operator": "thumbs",
      "inputs": [
        "selector"
      ]
    },
    "time_metadata": {
      "operator": "time_metadata",
      "inputs": [
        "selector"
      ]
    },
    "filter": {
      "operator": "filter",
      "inputs": [
        "averagecolor",
        "clip",
        "thumbnails",
        "time_metadata"
      ],
      "merge": "COMBINE"
    },
    "video_metadata": {
      "operator": "video_metadata",
      "inputs": [
        "filter"
      ]
    },
    "file_metadata": {
      "operator": "file_metadata",
      "inputs": [
        "video_metadata"
      ]
    }
  },
  "output": [
    "file_metadata"
  ]
}

Here are the key insights: Fundamentally, I don't think that there is a memory leak or memory allocation problem in vitrivr-engine. At least I couldn't spot one. However, one must be conscious about how the pipeline works and what the consequences of certain pipeline design decisions are.

This basic behaviour is illustrated by the graphics.

Screenshot 2024-07-26 at 10 42 47 Memory

Now there are several knobs to tune the memory consumption of the extraction pipeline:

That being said: In order to be able to debug your issue, we really need your extraction pipeline configuration.

ppanopticon commented 3 months ago

One additional comment just to illustrate my point: If I remove the MiddleContentAggreagtor from above configuration, memory starts to become an issue as well. Because for every second of video, 25 frames are kept in memory and features are extracted for all the 25 frames. This leads to 50 Descriptor and 25 ContentElement per Retrievable, which are kept around in memory until the entire video has been processed.

The video is 5mins and 8GB are not enough to handle this. But this is not an application issue. It's instructing the engine to do something it does not have the resources for.

Lanceeeelot commented 3 months ago

Thanks for the great insight, tests and tips. This is the old pipeline config i used (similar to the one used in Example): video-pipeline.json Here is a small sample of our dataset with our shortest (3:33 min) and longest (18:43 min) video: example_dataset link

I'm going to start a new extraction now and try different tweaks, like using the CachedContentFactory , MiddleContentAggregator or adjusting the timeWindowms. Afterwards i will share my results here.

My Apologies for the late Answer. I only work one day a week on this project.

ppanopticon commented 3 months ago

Hey! Any update on this?

I played with your configuration myself and noticed, that the extraction definition is incorrect. Here is a better version, which I successfully used to extract all three videos using 12GB of RAM

{
  "schema": "vitrivr",
  "context": {
    "contentFactory": "CachedContentFactory",
    "resolverName":"disk",
    "local": {
      "enumerator": {
        "path": "/Users/rgasser/Downloads/example_dataset",
        "depth": "3",
        "skip": "0",
        "limit": "20"
      },
      "decoder": {
        "timeWindowMs": "30_000"
      },
      "filter": {
        "type": "SOURCE:VIDEO"
      }
    }
  },
  "operators": {
    "enumerator": {
      "type": "ENUMERATOR",
      "factory": "FileSystemEnumerator",
      "mediaTypes": ["VIDEO"]
    },
    "decoder": {
      "type": "DECODER",
      "factory": "VideoDecoder"
    },
    "selector": {
      "type": "TRANSFORMER",
      "factory": "LastContentAggregator"
    },
    "averagecolor": {
      "type": "EXTRACTOR",
      "fieldName": "averagecolor"
    },
    "clip": {
      "type": "EXTRACTOR",
      "fieldName": "clip"
    },
    "dino": {
      "type": "EXTRACTOR",
      "fieldName": "dino"
    },
    "whisper": {
      "type": "EXTRACTOR",
      "fieldName": "whisper"
    },
    "ocr": {
      "type": "EXTRACTOR",
      "fieldName": "ocr"
    },
    "meta-file": {
      "type": "EXTRACTOR",
      "fieldName": "file"
    },
    "meta-video": {
      "type": "EXTRACTOR",
      "fieldName": "video"
    },
    "meta-time": {
      "type": "EXTRACTOR",
      "fieldName": "time"
    },
    "thumbnail": {
      "type": "EXPORTER",
      "exporterName": "thumbnail"
    },
    "filter": {
      "type": "TRANSFORMER",
      "factory": "TypeFilterTransformer"
    }
  },
  "operations": {
    "stage-0-0": {"operator": "enumerator"},
    "stage-1-0": {"operator": "decoder","inputs": ["stage-0-0"]},
    "stage-2-0": {"operator": "selector","inputs": ["stage-1-0"]},
    "stage-3-0": {"operator": "clip","inputs": ["stage-2-0"]},
    "stage-3-1": {"operator": "dino","inputs": ["stage-2-0"]},
    "stage-3-2": {"operator": "ocr","inputs": ["stage-2-0"]},
    "stage-3-3": {"operator": "averagecolor","inputs": ["stage-2-0"]},
    "stage-3-4": {"operator": "thumbnail","inputs": ["stage-2-0"]},
    "stage-3-5": {"operator": "meta-time","inputs": ["stage-2-0"]},
    "stage-3-6": {"operator": "whisper","inputs": ["stage-2-0"]},
    "stage-4-0": {"operator": "filter", "inputs": ["stage-3-6", "stage-3-5", "stage-3-4", "stage-3-3", "stage-3-2", "stage-3-1", "stage-3-0"], "merge": "COMBINE"},
    "stage-5-0": {"operator": "meta-file", "inputs": ["stage-4-0"]},
    "stage-6-0": {"operator": "meta-video", "inputs": ["stage-5-0"]}
  },
  "output": [
    "stage-6-0"
  ]
}
Lanceeeelot commented 3 months ago

I tried different configs aswell and was able to extract all videos by increasing the timeWindowms.

The pc im using got 64GB of RAM, so that should not be a problem. I will do more tests tomorrow and try your config on the whole dataset of 19 Videos. Then update here and most likely close this issue. Thanks for the help and guidance.

Lanceeeelot commented 3 months ago

Thanks to your configuration i was able to fully extract all videos without any error. The process created 19.379 thumbnails, before it would stop around 9000 (with the same timeWindowms). While testing different configurations i didn't came across any other mayor Issues or notable takeaways for this Issue.

Could you highlight which part of the extraction definition was incorrect before closing this issue?

ppanopticon commented 3 months ago

Glad to hear.

Well incorrect is a bit of a misnomer. Let's say "not ideal". Fundamentally, there are two (somehow contradictory) paradigms that are used during extraction and that one needs to be aware of:

  1. Retrievable are objects that describe part of a media file (e.g., a segment or the entire file itself). They contain all the Descriptors and potentially Relationships to other Retrievables. Typcially, Retrievable are shared between operators, that is, different operators may see and edit the same instance at different points in time.
  2. Pipelines define sequential streams of Retrievables (that can branch and merge). That stream can be shaped using certain operators (e.g., filters)

That is, the entire object graph generated during an extraction is kept in memory . Every Retrievable that reaches the end is persisted with ALL its relationships, descriptors etc.

Since the extraction process for certain media types define explicit relationships between Retrievables (e.g., video segment to video file), one might end-up in the situation where a Retrievable can be persisted twice because of relationships between them. Hence, when designing the pipeline, one must shape the stream such that only the desired Retrievables make it to the end. In case of a video, it makes sense to persist on a per-file basis.

When designing a pipeline, one should therefore try to think in terms of inputs and outputs:

Therefore, the following setup makes sense in this case:

I hope this makes sense.

Lanceeeelot commented 3 months ago

Thanks for the detailed explanation! The pipeline is a lot clearer to me now. I removed my example dataset, because i'm not able to share it permanently.