sign-language-processing / detection-train

Training a sign language detection model
8 stars 2 forks source link

Memory usage linearly increasing while iterating on tfds datasets #6

Open amanzotti opened 1 year ago

amanzotti commented 1 year ago

Hi! I am trying to load and play with the dgs corpus dataset. Now I load it and it downloads them locally. Then try to loop trough them but even if I just sleep in the first iteration of the loop or I do not do anything the memory usage linearly increase. This is the code I am using

config = sign_language_datasets.datasets.dgs_corpus.DgsCorpusConfig(
        name="holistic_m", include_video=False, include_pose="holistic"
    )
dgs_corpus = tfds.load(name="dgs_corpus", builder_kwargs=dict(config=config

    with tf.io.TFRecordWriter('data.tfrecord') as writer:
        for datum in dgs_corpus["train"]:
            time.sleep(3000)

and you can see from the memory profile output

memory1

The final goal is to either save them in npy format or load them in PyTorch because that is what our pipeline currently accept.

Any helps or pointers would be great!

Thanks

AmitMY commented 1 year ago

Hi, could you please attach the exact code you are using (what you attached is not valid python, so I'm afraid there was some post-editing involved)

And it would be nice to run this loop without with tf.io.TFRecordWriter('data.tfrecord') as writer: because in the code you send, you are not writing to it.

(like this, did not run myself as I'm not sure how you are profiling the memory)

import time
config = sign_language_datasets.datasets.dgs_corpus.DgsCorpusConfig(
        name="holistic_m", include_video=False, include_pose="holistic")
dgs_corpus = tfds.load(name="dgs_corpus", builder_kwargs=dict(config=config))
for datum in dgs_corpus["train"]:
  time.sleep(3000)