Open Jarradmorden opened 4 hours ago
Hi @Jarradmorden 👋
Thank you for the report. Indeed, the notebook is a bit behind, but more importantly, Autodistill hasn't been updated to handle large datasets.
I'll have a look at both this evening. Most likely I can bring it up to speed.
Hi @Jarradmorden 👋
Thank you for the report. Indeed, the notebook is a bit behind, but more importantly, Autodistill hasn't been updated to handle large datasets.
I'll have a look at both this evening. Most likely I can bring it up to speed.
Thanks that would be great!
The notebook now uses the latest supervision, and the dataset is loaded lazily.
Depending on the part where you ran out of RAM previously, this time it could work. Notably, a high-ram operation is calling dataset.images
, which the new code avoids.
It might be enough, and Autodistill won't need any changes.
Awesome thanks! Very swift and speedy I will let you know if it works before accepting answer =]
Search before asking
Notebook name
https://github.com/roboflow/notebooks/blob/main/notebooks/how-to-auto-train-yolov8-model-with-autodistill.ipynb
Bug
Hello,
I am following some of the tutorials that roboflow offers, I am doing a custom dataset I have over 9000 pictures and I am using the ontology, it would seem when I reach the end of my training I run into memory issues "numpy._core._exceptions._ArrayMemoryError: Unable to allocate 11.9 MiB for an array with shape (6, 1080, 1920) and data type bool" and it only works if I train with a much smaller sample, I am guessing this is because they are all being processed in one go, I followed the tutorial for making a custom dataset but I think this would happen to anyone with a much larger size. How can I get around this issue.
I understand lowering the resolution would help but still by having so many issues I face the same issue, how would i go around this?
I put this as a bug because I am not sure if this notebook accounts for very large datasets so it would be good to do that if it's something that can happen to anyone, thank you
ISSUE
left_9_2024-10-18 15-14-24.770364.png: 100%|██████████████████████████████████| 12498/12498 [6:07:58<00:00, 1.77s/it] Passing a
dataset = sv.DetectionDataset.from_yolo(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Usersxxxx\AppData\Local\anaconda3\envs\visper_environment\Lib\site-packages\supervision\dataset\core.py", line 497, in from_yolo
classes, image_paths, annotations = load_yolo_annotations(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cxxxxxx\AppData\Local\anaconda3\envs\xxxx_environment\Lib\site-packages\supervision\dataset\formats\yolo.py", line 120, in yolo_annotations_to_detections
mask = _polygons_to_masks(polygons=polygons, resolution_wh=resolution_wh)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\xxxxxx\AppData\Local\anaconda3\envs\xxx_environment\Lib\site-packages\supervision\dataset\formats\yolo.py", line 50, in _polygons_to_masks
Dict[str, np.ndarray]
intoDetectionDataset
is deprecated and will be removed insupervision-0.26.0
. Use a list of pathsList[str]
instead. Found dataset\train\images\left_9339_2024-10-18 15-23-39.366113.jpg as already present, not moving anything to dataset\train\images Found dataset\train\labels\left_9339_2024-10-18 15-23-39.366113.txt as already present, not moving anything to dataset\train\labels Found dataset\train\images\left_4584_2024-10-18 15-18-59.357296.jpg as already present, not moving anything to dataset\train\images Found dataset\train\labels\left_4584_2024-10-18 15-18-59.357296.txt as already present, not moving anything to dataset\train\labels Found dataset\train\images\left_4583_2024-10-18 15-18-59.323971.jpg as already present, not moving anything to dataset\train\images Found dataset\train\labels\left_4583_2024-10-18 15-18-59.323971.txt as already present, not moving anything to dataset\train\labels Found dataset\train\images\left_3070_2024-10-18 15-17-30.481540.jpg as already present, not moving anything to dataset\train\images Found dataset\train\labels\left_3070_2024-10-18 15-17-30.481540.txt as already present, not moving anything to dataset\train\labels Found dataset\train\images\left_9338_2024-10-18 15-23-39.299582.jpg as already present, not moving anything to dataset\train\images Found dataset\train\labels\left_9338_2024-10-18 15-23-39.299582.txt as already present, not moving anything to dataset\train\labels Found dataset\train\images\left_9335_2024-10-18 15-23-39.131904.jpg as already present, not moving anything to dataset\train\images Found dataset\train\labels\left_9335_2024-10-18 15-23-39.131904.txt as already present, not moving anything to dataset\train\labels Found dataset\train\images\left_9336_2024-10-18 15-23-39.198825.jpg as already present, not moving anything to dataset\train\images Found dataset\train\labels\left_9336_2024-10-18 15-23-39.198825.txt as already present, not moving anything to dataset\train\labels Found dataset\train\images\left_9337_2024-10-18 15-23-39.265686.jpg as already present, not moving anything to dataset\train\images Found dataset\train\labels\left_9337_2024-10-18 15-23-39.265686.txt as already present, not moving anything to dataset\train\labels Labeled dataset created - ready for distillation. Traceback (most recent call last): File "c:\Users\xxxxxx\DATA\model.py", line 45, inreturn np.array( ^^^^^^^^^ numpy._core._exceptions._ArrayMemoryError: Unable to allocate 11.9 MiB for an array with shape (6, 1080, 1920) and data type bool
MY CODE
Windows 11
I have quite good specs too python 3.11 and NVIDEA RTX A5000 Graphics card