roboflow / roboflow-python

The official Roboflow Python package. Manage your datasets, models, and deployments. Roboflow has everything you need to build a computer vision application.
https://docs.roboflow.com/python
Apache License 2.0
272 stars 71 forks source link

fix: raise UploadErrors for images and annotations #266

Closed caiquejjx closed 1 week ago

caiquejjx commented 2 months ago

Description

Fixes issue #235 by raising UploadError exceptions that happens inside single_upload

Type of change

Please delete options that are not relevant.

How has this change been tested, please provide a testcase or example of how you tested the change?

Same test as the issue

import json
import roboflow

rf = roboflow.Roboflow(api_key="API_KEY")
img_path = "<path_to_img>"

workspace_id = "<ws ID>"
project_id = "<proj ID>"
workspace = rf.workspace(workspace_id)
project = workspace.project(project_id)

annotation = {
    "it is": [
        {
            "very frustrating": 0,
            "when you": "send",
            "an annotation": {
                "file and": 0.5,

                "function": 0.5,
                "call": 0.5,
                "completes": 0.5,
                "successfully,": 0.5,

            },
            "but": "there's",
            "no": "annotations",
            "on": "the website",
            "nor": "any",
            "errors": "raised",
        }
    ]
}
with open("annotation.json", "w") as f:
    json.dump(annotation, f)

image = project.upload(img_path, annotation_path="annotation.json")

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

LinasKo commented 2 months ago

Hey @caiquejjx 👋

Typically, we ask for a Colab to showcase the changes, which significantly speeds up the review process.

I made one for us this time: https://colab.research.google.com/drive/1xNixTC2xljopKhSpatDhYr2pCY0b5HvG#scrollTo=xI1cabXdatjr

I like the change, but wonder what the broader impact would be. Whether there's anyone relying on us failing silently.

caiquejjx commented 2 months ago

Thanks @LinasKo! yeah it makes sense

CLAassistant commented 1 month ago

CLA assistant check
All committers have signed the CLA.

LinasKo commented 1 week ago

Hi @caiquejjx, I'll be looking at this, and if you're around - let's get it merged!

I've tested the solution and it works well, but I'd like a slightly safer approach to parsing the error.

Here's what I have in mind:

    def _parse_upload_error(self, error: rfapi.UploadError) -> str:
        dict_part = str(error).split(": ", 2)[2]
        # Could be done with ast.literal_eval, but this is safer.
        if re.search(r"'\w+':", dict_part):
            temp_str = dict_part.replace(r"\'", "<PLACEHOLDER>")
            temp_str = temp_str.replace("'", '"')
            dict_part = temp_str.replace("<PLACEHOLDER>", "'")
        parsed_dict: dict = json.loads(dict_part)
        message = parsed_dict.get("message")
        return message or str(parsed_dict)

Could you replace the method (and import re instead of ast), and then test it?

Let's also add a correct annotation test, with annotation format such as this:

Annotations Dict ```python annotation = { "info": { "year": "2020", "version": "1", "description": "None", "contributor": "Linas", "url": "https://app.roboflow.ai/datasets/hard-hat-sample/1", "date_created": "2000-01-01T00:00:00+00:00", }, "licenses": [{"id": 1, "url": "https://creativecommons.org/publicdomain/zero/1.0/", "name": "Public Domain"}], "categories": [ {"id": 0, "name": "cat", "supercategory": "animals"}, ], "images": [ { "id": 0, "license": 1, "file_name": img_path, "height": 1024, "width": 1792, "date_captured": "2020-07-20T19:39:26+00:00", } ], "annotations": [ { "id": 0, "image_id": 0, "category_id": 0, "bbox": [45, 2, 85, 85], "area": 7225, "segmentation": [], "iscrowd": 0, }, { "id": 1, "image_id": 0, "category_id": 0, "bbox": [324, 29, 72, 81], "area": 5832, "segmentation": [], "iscrowd": 0, }, ], } ```
caiquejjx commented 1 week ago

Hey @LinasKo sure thing, I'll work on these soon and I guess we should also have automated tests right? (not sure if it was what you meant)

LinasKo commented 1 week ago

Aye, I wasn't too clear. A Colab will be fine this time - no need for unit tests.

caiquejjx commented 1 week ago

@LinasKo done, I had to recreate the colab in order to edit

LinasKo commented 1 week ago

@tonylampada, could you please have a look at this? I believe it is ready to merge.

tonylampada commented 1 week ago

@LinasKo I think uploading with the CLI relies on the previous more lenient handling of annotation errors. Can you test that and see if it still has the same behaviour when importing datasets with the CLI?

Also, I think the backend already provides valid json in case of error, it just seems weird parsing it using regex like that.

LinasKo commented 1 week ago

@tonylampada, here's the repr of the error. It's not valid json - the strings are declared with '!.

UploadError("save annotation for MzqOjxfl22A5rxmFhrUI / bad response: 400: {'message': 'Image was already annotated.', 'type': 'InvalidImageException', 'hint': 'This image was already annotated; to overwrite the annotation, pass overwrite=true as a query parameter.'}")

I very much agree it's weird! The correct solution would be to change wherever the json is produced. Alternatively, let's use the whole string. Also, we can avoid all of this with ast.literal_eval, as per original suggestion by @caiquejjx, but that's a slight risk if the API error ever contains user-defined data.

CLI test results coming soon.

LinasKo commented 1 week ago

@tonylampada, I've tested:

However, I also labelled a keypoints dataset, created a version and attempted to download it, which failed. I think it's related to json, but unrelated to my changes.

Could you test if the following works?

roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp
Error response ❯ roboflow download https://app.roboflow.com/linas-ws/keypooint-test/1 -l dataset-kp loading Roboflow workspace... loading Roboflow project... Traceback (most recent call last): File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 974, in json return complexjson.loads(self.text, **kwargs) File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/Users/linasko/.pyenv/versions/3.8.19/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 258, in export raise RuntimeError(response.json()) File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 978, in json raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/linasko/rf/pr/roboflow-python/.venv/bin/roboflow", line 8, in sys.exit(main()) File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 423, in main args.func(args) File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/roboflowpy.py", line 45, in download version.download(args.format, location=args.location, overwrite=True) File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 205, in download self.export(model_format) File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/roboflow/core/version.py", line 260, in export response.raise_for_status() File "/Users/linasko/rf/pr/roboflow-python/.venv/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.roboflow.com/linas-ws/keypooint-test/1/voc?api_key=
LinasKo commented 1 week ago

@tonylampada, responding to your Loom, I see the bigger picture now. We're both working towards the same goal of not hiding useful error messages.

The problem is that dataset import captures the result of project.single_upload and logs it with _log_img_upload, whereas uploading a single image/annotation completely ignores the result.

The correct solution would keep one of these approaches. Should we implement an equivalent of _log_img_upload in roboflow/roboflowpy.py::upload_image(), and then remove the raise introduced by this PR?

tonylampada commented 1 week ago

Aforementioned loom here just for reference (so we don't exclude @caiquejjx from the conversation :-)) https://www.loom.com/share/5f7a6520e0364b7588fd6130c682e0ef

tonylampada commented 1 week ago

@LinasKo here's what I'm thinking.

project.single_upload has an "identity crisis" because it's not exactly "single" upload, is it? :-) It can upload:

Then there's the question of what do we do when something goes wrong with one of those operations. Ideally we should decouple those in different methods, but that would be a breaking change, probably not worth it. So, scratch that.

Then, when should single_upload throw an error? One possible answer is:

I think this kinda makes sense from the perspective of what error means. But I don't like it. Because it's a little more complex to implement inside (single_upload) and that complexity also leaks by forcing the caller to know when to handle and/or look inside the results.

So I'm tending to think that a better answer is:

Now, duplicating _log_img_upload is something I don't like. DRY, right? Maybe we should move that responsibility into single_upload as well, before returning the result dic. wdyt?

caiquejjx commented 1 week ago

@tonylampada thanks for the loom! one note is that single_upload kinda did that before already but the problem was that the upload method didn't return anything and one of the reasons I went for raising is because people probably don't check the return of the upload since it doesn't returns anything, so besides the change on single_upload we should also start returning the result in upload, right? or maybe logging

tonylampada commented 1 week ago

we should also start returning the result in upload, right?

Yup. That makes sense

or maybe logging

I think it might be better to move the logging responsibility into single_upload.

lrosemberg commented 6 days ago

I'm writing some tests for my PR (that maintains the compatibility with CLI), and this PR breaks the upload parse in some cases, I'll describe it better below.

In the line where the split is done by a colon and you access position 2, not all returns from the rfapi.upload_image have the status code, and so they don't have this second colon.

image

CleanShot 2024-08-18 at 17 57 44@2x

@caiquejjx @LinasKo @tonylampada

This PR should fix that, and another things:

312