Closed DovydasPociusDroneTeam closed 1 year ago
Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.
How many images are in your dataset? How large are the images?
This error is the Out of Memory (OOM) killer on your machine acting to ensure the Python process doesn't take up too much RAM and cause instability on your system. This suggests your system isn't able to store all of the images in your dataset in memory, which is required to convert the datasets.
i have about 6000 images (1024x1024). So if it is OOM problem, any suggestions how can i get around this problem?
Hi, @DovydasPociusDroneTeam 👋🏻 This is interesting. So the script died, but the output datasets got saved anyway? Would love to learn more.
Hi @DovydasPociusDroneTeam 👋 , we can dig deeper into the process to check for memory leakage but that will take some time.
But answer to another quesiton about class names are rearranged, I might have an idea. @SkalskiP this is due to sorting class based on alphabetic order check here.
@hardikdava I'm not sure. We got the input dataset. We divided that dataset into two parts. Saved both parts in YOLO format. Both Output datasets have different class orders. Do I understand the problem correctly?
Is the order different between input and output datasets? Or between both output datasets?
@hardikdava It is somehow a related topic. I think in the future, we should migrate sv.DetectionDataset.classes
to be the Dict[int, str]
, not List[str]. We get more and more trouble with braking indexes.
@SkalskiP The changing of sv.DetectionDataset.classes
into Dict[int, str]
was already in my mind. We should definately do it.
Hi, @DovydasPociusDroneTeam 👋🏻 This is interesting. So the script died, but the output datasets got saved anyway? Would love to learn more.
i didn't get output from one 6000 images dataset.
So i tried this dataset split in to 3 separates datasets: instead of having
train_dataset_images (6000 images)
└ ├ coco.json
├ image1
├ image2
├ image2
└ imageN
i did
train_dataset_images_part1 (2000 images)
└ ├ coco_part1.json
├ image1
├ image2
├ image3
└ imageM
train_dataset_images_part2 (2000 images)
└ ├ coco_part2.json
├ imageM+1
├ imageM+2
├ imageM+3
└ imageN
train_dataset_images_part3 (2000 images)
└ ├ coco_part3.json
├ imageN+1
├ imageN+2
├ imageN+3
└ imageZ
and for every separate dataset with 2000 images i ran script from_coco().as_yolo() and and i was able to get results without error, but then i checked every output yaml file and saw "names" array was not same.
@DovydasPociusDroneTeam, thanks a lot for helping us to understand what's happening. Could you help us a bit more and check the categories
key in coco_part1.json
, coco_part2.json
, and coco_part3.json
.
Please paste categories
for each JSON here. If categories are precisely the same in each JSON, then we have a problem.
@DovydasPociusDroneTeam, thanks a lot for helping us to understand what's happening. Could you help us a bit more and check the
categories
key incoco_part1.json
,coco_part2.json
, andcoco_part3.json
.Please paste
categories
for each JSON here. If categories are precisely the same in each JSON, then we have a problem.
You are right! In my coco_part1.json and coco_part2.json categories are not in the same sequence!
Okeyy... So i used bad converter from LABELME to COCO (when splitted dataset to 3 separates), don't know why it mixed categories sequence..
Thank you for that info! Looking forward to converting the full dataset without needing to split it into separate parts!
@DovydasPociusDroneTeam 🔥 Awesome that we managed to get to the bottom of this problem.
Looking forward to converting the full dataset without needing to split it into separate parts!
We will need to introduce lazy loading of images to make that happen. It is on our roadmap. I'll pin this issue there to keep track of that problem.
I'll close the issue for now.
@SkalskiP Can't we just save the dataset in pandas dataset then retrieve it batch by batch?
Hi @Killua7362 👋🏻 Could you elaborate?
Hello @SkalskiP hope you are well I am new to this community So I might be wrong here and If I want to create a dataset of images using roboflow will it save a generator object or whole dataset?
Hi @Killua7362 👋🏻 No worries. I'm happy to explain. For now, you will always load the whole dataset, but we are thinking about adding a generator option.
Hi @Killua7362 👋🏻 No worries. I'm happy to explain. For now, you will always load the whole dataset, but we are thinking about adding a generator option.
Can I try adding that option if you don't mind? @SkalskiP
@SkalskiP i use this datasets with 1 label https://universe.roboflow.com/naumov-igor-segmentation/car-segmetarion:
but i use this script coco2yolo,but i got 2 labeles
import supervision as sv
sv.DetectionDataset.from_coco(
images_directory_path= r"C:\Users\loong\Downloads\Car\valid",
annotations_path=r"C:\Users\loong\Downloads\Car\valid\_annotations.coco.json",
force_masks=True
).as_yolo(
images_directory_path=r"C:\Users\loong\Downloads\Car_yolo\val\images",
annotations_directory_path=r"C:\Users\loong\Downloads\Car_yolo\val\labels",
data_yaml_path=r"C:\Users\loong\Downloads\Car_yolo\data.yaml"
)
and the generated format doesn't seem right either
Search before asking
Bug
getting "Killed" error while converting dataset from coco to yolo (the code is given bellow):
i tried to split manually big dataset in smaller parts (3 parts) and then didn't get error, but in .YAML file i got different classes positions in "names" part
and
any suggestions? Thank you in advance!
Environment
Minimal Reproducible Example
Additional
No response
Are you willing to submit a PR?