roboflow / roboflow-python

The official Roboflow Python package. Manage your datasets, models, and deployments. Roboflow has everything you need to build a computer vision application.
https://docs.roboflow.com/python
Apache License 2.0
272 stars 71 forks source link

bugfix: CLI hangs with super big dataset #255

Closed NickHerrig closed 3 months ago

NickHerrig commented 3 months ago

Description

Fixes roboflow-bugtracker#908

Two dictionaries were added to speed up the search for images and annotations rather than iterating through a list to find matches.

Along with that a tqdm loading bar was added to the image loop to give users feedback during the parsing of their annotation/image folder.

List any dependencies that are required for this change.

Type of change

Please delete options that are not relevant.

How has this change been tested, please provide a testcase or example of how you tested the change?

All existing tests under tests/util/test_folderparser.py are passing after this work. An inital stub test below was implemented to test speed improvements on the CocoNut Datset.

if you have the coconut dataset, it can be passed to the dev container with the following docker-compose.yml

version: '3'
services:
  devcontainer-roboflow-python:
    build:
      context: ..
      dockerfile: Dockerfile.dev
    image: devcontainer-roboflow-python
    volumes:
      - ..:/roboflow-python
      - {path-to-coconut-dataset}:/coconut
    command: sleep infinity
def test_parse_coconut(self):
        folder = "/coconut/images/COCONut-S/"
        parsed = folderparser.parsefolder(folder)

Before this PR: slow_upload

After this PR: optimized_upload

NickHerrig commented 3 months ago

@tonylampada ready for review. Let me know if you have any ideas for improving this.

tonylampada commented 3 months ago

@NickHerrig this is looking great! I'm making a couple more commits on top of it ok?