Determine representation of import-related annotation metadata

lyzadanger commented 11 months ago

Backend and services: Determine representation of import-related annotation metadata and ensure API services can take and provide import metadata.

marcospri commented 11 months ago

The way H's API handles additional metadata in the API is a bit too flexible, see:

https://hypothes-is.slack.com/archives/C1MA4E9B9/p1688637604519929

Changing that it's outside the scope of this project and tricky to do in any case due to backward compatibility concerns for clients that rely on that behavior.

Due to that flexibility, we can choose between different approaches with no code changes.

IMO we have to balance two issues while picking one representation:

Not polluting the global annotation namespace. Although that's supported by the API I don't like it as a general API design.
Avoiding wasting too much DB space with very nested values in a dictionary (ie, extra -> import -> source -> id

With that in mind, my preferred approach will be something along the lines:

POST http://localhost:5000/api/annotations
{ 
  "extra": {
      "original_id": "XXX",
      "source": "import",
  }
  ...
  "text": "..."
  "document": {...}
  "target": [...]
}

Some other examples here. Note that the representation while POSTING will be mirrored while GETTING (and searching etc).

The representation in the DB will also match this structure.

Nested structure inside a general metadata container

POST http://localhost:5000/api/annotations
{ 
  "extra": {
    "import": {
      "original_id": "XXX",
    }
  }  
  ...
  "text": "..."
  "document": {...}
  "target": [...]
}

Top-level fields

POST http://localhost:5000/api/annotations
{ 
  "source": "import",
  "original_id": "XXX",
  ...
  "text": "..."
  "document": {...}
  "target": [...]
}

Nested structure

POST http://localhost:5000/api/annotations
{ 
  "import": {
      "original_id": "XXX",
  }
  ...
  "text": "..."
  "document": {...}
  "target": [...]
}

acelaya commented 11 months ago

I'm personally ok with using your preferred approach.

The import_id is already a "piece of text" by design for other reasons, so I think it should be ok to put it inside extra for now..

marcospri commented 11 months ago

Going with the proposal above, modulo any naming changes made during implementation to be reviewed in PRs.

POST http://localhost:5000/api/annotations
{ 
  "extra": {
      "original_id": "XXX",
      "source": "import",
  }
  ...
  "text": "..."
  "document": {...}
  "target": [...]
}

hypothesis / product-backlog