ultralytics / hub

Ultralytics HUB tutorials and support
https://hub.ultralytics.com
GNU Affero General Public License v3.0
121 stars 11 forks source link

Dataset upload structuring #577

Open Burhan-Q opened 6 months ago

Burhan-Q commented 6 months ago

Search before asking

HUB Component

Datasets

Bug

Dataset structure shown on HUB

image

Tested working dataset structure and YAML file

data
└───data-seg20
        ├───data.yaml
        ├───train
        │     ├───images
        │     └───labels
        └───valid
              ├───images
              └───labels
path: ../data-seg20
train: train/images
val: valid/images
test: null
names:
  0: crack

Also mentioned on https://github.com/ultralytics/hub/issues/569#issuecomment-1953054027

Environment

OS                  Windows-10-10.0.19045-SP0
Environment         Windows
Python              3.11.6
Install             git
RAM                 31.86 GB
CPU                 Intel Core(TM) i5-10600K 4.10GHz
CUDA                12.1

matplotlib          ✅ 3.8.2>=3.3.0
numpy               ✅ 1.26.3>=1.22.2
opencv-python       ✅ 4.9.0.80>=4.6.0
pillow              ✅ 10.2.0>=7.1.2
pyyaml              ✅ 6.0.1>=5.3.1
requests            ✅ 2.31.0>=2.23.0
scipy               ✅ 1.11.4>=1.4.1
torch               ✅ 2.1.1+cu121>=1.8.0
torchvision         ✅ 0.16.1+cu121>=0.9.0
tqdm                ✅ 4.66.1>=4.64.0
psutil              ✅ 5.9.7
py-cpuinfo          ✅ 9.0.0
thop                ✅ 0.1.1-2209072238>=0.1.1
pandas              ✅ 2.1.4>=1.1.4
seaborn             ✅ 0.13.1>=0.11.0

Minimal Reproducible Example

No response

Additional

I attempted again with the tiger pose dataset which uploaded with out issue, but failed due to a Timeout error. Retrying immediately raised a timeout error.

image

Burhan-Q commented 6 months ago

NOTE eventually the tiger pose dataset shows no errors, but I was not observing (or timing) when this occurred.

image

kalenmike commented 6 months ago

@Burhan-Q We have error handling in place to manage multiple different formats, but we only suggest the correct one. I am not clear if you are stating that you are not able to upload a dataset format like the example or only that you can upload a dataset formatted differently?

The timeout error suggest that there was an issue connected with the server, we allow a retry option from the dropdown in those cases.

Burhan-Q commented 6 months ago

@kalenmike I was only able to upload a dataset with the structure mentioned in the opening comment. It is not possible to upload a dataset using the shown layout, not only did I have this issue it's been experienced by other users (how it was brought to my attention).

With respect to the timeout error, I did attempt a retry and when I did it immediately failed again, but I may not have waited enough time to try again. The timeout error seemingly "resolved itself" as it showed as correctly uploaded some time after uploading with no preventing.

Burhan-Q commented 6 months ago

One thing that was frustrating about the dataset uploading errors is that there is no indication as to what the error is or what the problem might be. This means that if an upload fails, as a user I have no clue why or what to change/fix. Having some kind of report of what errors occurred would be helpful.

kalenmike commented 6 months ago

@Burhan-Q There is error reporting, it sounds like you just had the same issue every time. Timeout is no response from the server. We also have:

I may need to run through your issue with you tomorrow.

kalenmike commented 6 months ago

Also it looks like your dataset did not work because your YAML is not correct. Your YAML is telling us to look back a directory which is why you had to add another directory for it to work.

If you see the example YAML in HUB you will see there is no path key.

image

Burhan-Q commented 6 months ago

@kalenmike that's the crazy part, the YAML with path: ../data-seg20 did work for me yesterday.

I decided to do some testing and I'm wondering if something was strange in particular in the last few days because all of the iterations I tested below worked without error. I tested changing the directory structure by varying the presence of a subdirectory in the .zip and by changing the directory layout (I call them out as HUB vs YOLO formats) as well as by varying the use of path: ../VisDrone20 vs path: VisDrone20 with the different dataset layouts.

Retesting 2024-02-20

Test 1

Details

VisDrone20.yaml ```yaml path: ../VisDrone20 train: images/train val: images/val test: null names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor ``` VisDrone20.zip structure ``` VisDrone20.zip └───VisDrone20 ├───visdrone20.yaml ├───images │ ├───train │ └───val └───labels ├───train └───val ```

Test 2

Details

VisDrone20.yaml ```yaml path: ../VisDrone20 train: images/train val: images/val test: null names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor ``` VisDrone20.zip structure ``` VisDrone20.zip ├───visdrone20.yaml ├───images │ ├───train │ └───val └───labels ├───train └───val ```

Test 3

Details

VisDrone20.yaml ```yaml path: ../VisDrone20 train: images/train val: images/val test: null names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor ``` VisDrone20.zip structure ``` VisDrone20.zip ├───visdrone20.yaml ├───images │ ├───train │ └───val └───labels ├───train └───val ```

Test 4

Details

VisDrone20.yaml ```yaml path: VisDrone20 train: images/train val: images/val test: null names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor ``` VisDrone20.zip structure ``` VisDrone20.zip └───VisDrone20 ├───visdrone20.yaml ├───images │ ├───train │ └───val └───labels ├───train └───val ```

Test 5

Details

VisDrone20.yaml ```yaml path: VisDrone20 train: train/images val: val/images test: null names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor ``` VisDrone20.zip structure ``` VisDrone20.zip └───VisDrone20 ├───visdrone20.yaml ├───train │ ├───images │ └───labels └───val ├───images └───labels ```

Test 6

Details

VisDrone20.yaml ```yaml path: ../VisDrone20 train: train/images val: val/images test: null names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor ``` VisDrone20.zip structure ``` VisDrone20.zip └───VisDrone20 ├───visdrone20.yaml ├───train │ ├───images │ └───labels └───val ├───images └───labels ```

Test 7

Details

VisDrone20.yaml ```yaml path: ../VisDrone20 train: train/images val: val/images test: null names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor ``` VisDrone20.zip structure ``` VisDrone20.zip ├───visdrone20.yaml ├───train │ ├───images │ └───labels └───val ├───images └───labels ```

Test 8

Details

VisDrone20.yaml ```yaml path: VisDrone_20 train: train/images val: val/images test: null names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor ``` VisDrone20.zip structure ``` VisDrone_20.zip ├───visdrone20.yaml ├───train │ ├───images │ └───labels └───val ├───images └───labels ```

kalenmike commented 6 months ago

@Burhan-Q To confirm you are no longer seeing any errors?

We have an example of what a dataset should look like, but we also fix datasets with very common and obvious mistakes. The dataset processing happens after it is requested so sometimes it can fail without any reason or crash due to excess memory usage. We are constantly optimizing this.

Burhan-Q commented 6 months ago

Yeah I was unable to get an error in testing any of the examples above. I failed to document as thoroughly the attempts I made from yesterday, so it makes it more difficult to pin down the issue. I think these tests cover most variations and all were successful.

@kalenmike is it possible for you to enable verbose logging to my HUB account? Something like "log every action for N hours" so there's a more traceable history for testing? To be clear I'm asking if it's possible, not for a feature add.

kalenmike commented 6 months ago

@Burhan-Q No, that's not possible.