Issue with provided Dataset and Package Compatibility

DishonestOne commented 1 year ago

I've been trying to use this package, but I've been struggling to get the package in a usable state. As of the moment, the only way I can see this working is to more or less uproot the code from the ground up, and I doubt that should be necessary.

Firstly, I tried to verify the dataset by typing in the following:

python3 obb_anns/debugging/verify_dataset.py obb_anns/sample

and here is the result:

Checking file 1 of 1... loading ann_info... Traceback (most recent call last): File "obb_anns/debugging/verify_dataset.py", line 24, in a.load_annotations() File "Project/obb_anns/obb_anns/obb_anns.py", line 116, in load_annotations with open(self.ann_file, 'r') as ann_file: FileNotFoundError: [Errno 2] No such file or directory: 'obb_anns/sample/obb_anns/sample/deepscores_v2_sample.json'

Obviously, there's a bit of an issue on how the code tries to find the file, so I changed the following lines (before is above, after is below):

line 23:

   a = OBBAnns(join(args.ROOT, dataset_ann_fp))

```
   a = OBBAnns((dataset_ann_fp))
```

line 36:
- images_dir = root_dir / 'images_png'
- images_dir = root_dir / 'images'

and the result was this instead, which I assume would mean that it indeed works:

Checking file 1 of 1... loading ann_info... done! t=0.00s 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 97.97imgs/s] Checking if every image has its annotation in the dataset... 1imgs [00:00, 12945.38imgs/s]

I tried running this code for ds2_dense, and this has always been the output. I'm not sure why this happens for that package, but I do not have the knowledge to know what to do about this: (This is done with images because the imagename.png is not in any JSON file error goes on for long enough that the start cannot be retraced.)

Then, since it appears to be that the program would need a proposals.json file from somewhere, though none of the datasets initially have a proposal file. Looking at the debugging tools, there appears to be a generate_test_proposals.py, so I tried the following:

python3 obb_anns/debugging/generate_test_proposals.py obb_anns/sample/deepscores_v2_sample.json

and this is the result:

loading ann_info... done! t=0.01s Traceback (most recent call last): File "obb_anns/debugging/generate_test_proposals.py", line 56, in main(args.GT) File "obb_anns/debugging/generate_test_proposals.py", line 35, in main bboxes = gt.ann_info[['bbox', 'cat_id', 'img_id']] File ".local/lib/python3.8/site-packages/pandas/core/frame.py", line 2806, in getitem indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1] File ".local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1550, in _get_listlike_indexer self._validate_read_indexer( File ".local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1644, in _validate_read_indexer raise KeyError(f"{not_found} not in index") KeyError: "['bbox'] not in index"

The problem here lies in how the dataset defines its bbox parameters. You can see here that the bbox is only one parameter, but in the json files they are split into two: a_bbox and o_bbox:

From here I could just replace all instances of bbox into a_bbox and o_bbox, but that also affects the entire rest of the code, where then obb_anns.py also only defines bbox as one parameter. Unless if there is another compatible dataset that I am unaware of, is there any particular solution that comes to mind?

fablau commented 1 year ago

Thank you for posting this; I have been struggling with this package myself, trying ti figure out how to use it.

By following the official instructions, I encounter errors like this one.

It is my guess that this package is abandoned and no longer maintained.

I am wondering if you could find a way to use the deep scores dataset in a usable manner by using modern systems like mmdetection or similar. I am spending a huge amount of time trying to understand how to use that dataset which is in a non-very-standard format.

Any thoughts on that are very welcome!

Thanks again.

DishonestOne commented 1 year ago

I have stumbled across a particular change that may help with this issue. What may be the case for this tool kit is that the debugging and sample dataset may be outdated, but the dataset can be visualized properly. This is the exact code I used to render the image. This is what happens when I try the code above, except that I refer to the sample dataset and change the img_idx to 0 (since there is only one image) 04-24_214921 This is the resulting rendered image. 04-24_215029 This is the resulting rendered image, if instances = False. Supposedly there should be a way to see the unique numerical id of each symbol instead of the recognized symbol name (i.e. noteheadBlackOnLine and timesig4)

This does mean that the dataset is usable to some capacity... but there does not appear to be a way to create proposals (are proposals actually necessary? I am not sure) nor how to train new models from scratch (or at least a means to input page(s) of sheet music and process it into a json file), at least from what I have seen. Let me know if there is something I have overseen, thanks!

fablau commented 1 year ago

This is awesome! Thank you so much!

I am researching a way to training a model from scratch as well, and I have some ideas to try. Mmdetection is one, detectron2 is another. I'll try them both and see which one gives better results.

It is my understanding that the main problem is that this dataset has its own format, and it should probably be converted into another format first (i.e. coco or YOLO?) and then treat it as any other "standard" set.

I'll post what I find out. Thanks again!

tangruinenu commented 1 year ago

This is awesome! Thank you so much!

I am researching a way to training a model from scratch as well, and I have some ideas to try. Mmdetection is one, detectron2 is another. I'll try them both and see which one gives better results.

It is my understanding that the main problem is that this dataset has its own format, and it should probably be converted into another format first (i.e. coco or YOLO?) and then treat it as any other "standard" set.

I'll post what I find out. Thanks again! Hello, I also want to convert the label of this data set to coco or yolo. Have you successfully converted

fablau commented 1 year ago

Not yet, unfortunately I had to take care of more urgent matters, but I plan to do it in the coming weeks. I'll keep you posted ;)

tangruinenu commented 1 year ago

fablau

Thank you. Let me know if you succeed. Thank you so much

yvan674 / obb_anns

Issue with provided Dataset and Package Compatibility #9