jedahan / collections-api

API scraping from the metmuseum website
http://scrAPI.org
36 stars 18 forks source link

problem in json's structure #48

Open eneim opened 10 years ago

eneim commented 10 years ago

When crawling /random to get the json response, I see some lines like "timelineList" but has only one "timeline item", those Lists are now defined without the "[" and "]" construct (don't know how to say that) which is only valid for a single item and it makes me confused if the response comes with more than one "timeline" item in the List. please help.

jedahan commented 10 years ago

Hmm, so one of the issues is that if there is only one element, for any property, the converter strips it from being an array.

There are a few ways to fix this:

a) Make every property an array. This requires client code to 0-index every property, like object.CRDID[0] and object.title[0]. This is the fastest fix, with the ugliest client code, and clients still have to know what the elements of the array are.

b) Only strip if there is a single item in the array and the property does not have the word 'list' in it. I think this is a good temporary fix.

b) Have a hand-tuned list of what should always be an array and what should always be a property. This is fragile but so is the rest of scrapi since there is no format definition from upstream metmuseum.

c) Define a schema using something like json-schema.org

The main issue is that there is no schema from metmuseum.org, but having a schema to validate against is really useful, so maybe that can be discussed here.

jedahan commented 10 years ago

Using the jsonschema.net generator on a single object, simplified (omitting ids and required:false):

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "timelineList": {
      "type": "object",
      "properties": {
        "timeline": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
            },
            "count": {
              "type": "integer",
            },
            "isCurrent": {
              "type": "boolean",
            },
            "url": {
              "type": "string",
            }
          }
        }
      }
    },
    "medium": {
      "type": "string",
    },
    "dimensions": {
      "type": "string",
    },
    "accessionNumber": {
      "type": "string",
    },
    "dateText": {
      "type": "integer",
    },
    "creditLine": {
      "type": "string",
    },
    "classificationList": {
      "type": "object",
      "properties": {
        "classification": {
          "type": "string",
        }
      }
    },
    "imageNo": {
      "type": "integer",
    },
    "primaryArtistNameOnly": {
      "type": "string",
    },
    "primaryArtistSuffix": {
      "type": "string",
    },
    "isLoanObject": {
      "type": "boolean",
    },
    "hasDescription": {
      "type": "boolean",
    },
    "currentImage": {
      "type": "object",
      "properties": {
        "CRDID": {
          "type": "integer",
        },
        "publicAccess": {
          "type": "boolean",
        },
        "imageUrl": {
          "type": "string",
        },
        "width": {
          "type": "integer",
        },
        "height": {
          "type": "integer",
        },
        "webWidth": {
          "type": "integer",
        },
        "webHeight": {
          "type": "integer",
        },
        "rank": {
          "type": "integer",
        },
        "primaryDisplay": {
          "type": "boolean",
        },
        "isZoomable": {
          "type": "boolean",
        }
      }
    },
    "noImageAvailable": {
      "type": "boolean",
    },
    "isThumbnailOnly": {
      "type": "boolean",
    },
    "audioCount": {
      "type": "integer",
    },
    "videoCount": {
      "type": "integer",
    },
    "numRelatedPublications": {
      "type": "integer",
    },
    "whoList": {
      "type": "object",
      "properties": {
        "who": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
            },
            "count": {
              "type": "integer",
            },
            "isCurrent": {
              "type": "boolean",
            },
            "url": {
              "type": "string",
            }
          }
        }
      }
    },
    "whatList": {
      "type": "object",
      "properties": {
        "what": {
          "type": "array",
          "items": {}
        }
      }
    },
    "whereList": {
      "type": "object",
      "properties": {
        "where": {
          "type": "array",
          "items": {}
        }
      }
    },
    "whenList": {
      "type": "object",
      "properties": {
        "when": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
            },
            "count": {
              "type": "integer",
            },
            "isCurrent": {
              "type": "boolean",
            },
            "url": {
              "type": "string",
            }
          }
        }
      }
    },
    "inTheMuseumList": {
      "type": "object",
      "properties": {
        "inTheMuseum": {
          "type": "object",
          "properties": {

              "type": "string",
            },
            "name": {
              "type": "string",
            },
            "count": {
              "type": "integer",
            },
            "isCurrent": {
              "type": "boolean",
            },
            "url": {
              "type": "string",
            }
          }
        }
      }
    },
    "isExhibitionArtWork": {
      "type": "boolean",
    },
    "addedToMyMet": {
      "type": "boolean",
    },
    "CRDID": {
      "type": "integer",
    },
    "title": {
      "type": "string",
    },
    "primaryArtist": {
      "type": "object",
      "properties": {
        "role": {
          "type": "string",
        },
        "name": {
          "type": "string",
        },
        "nationality": {
          "type": "string",
        }
      }
    },
    "galleryLink": {
      "type": "string",
    },
    "primaryImageUrl": {
      "type": "string",
    },
    "primaryImageWidth": {
      "type": "integer",
    },
    "primaryImageHeight": {
      "type": "integer",
    },
    "url": {
      "type": "string",
    },
    "xmlUrl": {
      "type": "string",
    },
    "informationBoxes": {
      "type": "object",
      "properties": {
        "informationBox": {
          "type": "array",
          "items": {}
        }
      }
    },
    "enlarge": {
      "type": "boolean",
    },
    "searchPageUrl": {
      "type": "string",
    },
    "hasSearchSet": {
      "type": "boolean",
    },
    "searchBackText": {
      "type": "string",
    },
    "searchBackUrl": {
      "type": "string",
    },
    "searchItemNo": {
      "type": "integer",
    },
    "searchTotalItems": {
      "type": "integer",
    },
    "hasRelatedContent": {
      "type": "boolean",
    },
    "relatedArtworkLinkCount": {
      "type": "integer",
    },
    "relatedItemLinkCount": {
      "type": "integer",
    },
    "relatedToahLinkCount": {
      "type": "integer",
    },
    "relatedTabs": {
      "type": "object",
      "properties": {
        "string": {
          "type": "array",
          "items": {}
        }
      }
    },
    "relatedItemList": {
      "type": "object",
      "properties": {
        "relatedItem": {
          "type": "array",
          "items": {}
        }
      }
    },
    "relatedToahLinkList": {
      "type": "object",
      "properties": {
        "relatedToahLink": {
          "type": "object",
          "properties": {
              "type": "string",
            },
            "title": {
              "type": "string",
            },
            "url": {
              "type": "string",
            }
          }
        }
      }
    },
    "relatedArtworkList": {
      "type": "object",
      "properties": {
        "relatedArtwork": {
          "type": "array",
          "items": {}
        }
      }
    },
    "showEmbeddedVideo": {
      "type": "boolean",
    },
    "showEmbeddedAudio": {
      "type": "boolean",
    },
    "hasMedia": {
      "type": "boolean",
    }
  }
}

full schema generated as reference:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "http://scrapi.org/object#",
  "type": "object",
  "required": false,
  "properties": {
    "timelineList": {
      "id": "#timelineList",
      "type": "object",
      "required": false,
      "properties": {
        "timeline": {
          "id": "#timeline",
          "type": "object",
          "required": false,
          "properties": {
            "name": {
              "id": "#name",
              "type": "string",
              "required": false
            },
            "count": {
              "id": "#count",
              "type": "integer",
              "required": false
            },
            "isCurrent": {
              "id": "#isCurrent",
              "type": "boolean",
              "required": false
            },
            "url": {
              "id": "#url",
              "type": "string",
              "required": false
            }
          }
        }
      }
    },
    "medium": {
      "id": "#medium",
      "type": "string",
      "required": false
    },
    "dimensions": {
      "id": "#dimensions",
      "type": "string",
      "required": false
    },
    "accessionNumber": {
      "id": "#accessionNumber",
      "type": "string",
      "required": false
    },
    "dateText": {
      "id": "#dateText",
      "type": "integer",
      "required": false
    },
    "creditLine": {
      "id": "#creditLine",
      "type": "string",
      "required": false
    },
    "classificationList": {
      "id": "#classificationList",
      "type": "object",
      "required": false,
      "properties": {
        "classification": {
          "id": "#classification",
          "type": "string",
          "required": false
        }
      }
    },
    "imageNo": {
      "id": "#imageNo",
      "type": "integer",
      "required": false
    },
    "primaryArtistNameOnly": {
      "id": "#primaryArtistNameOnly",
      "type": "string",
      "required": false
    },
    "primaryArtistSuffix": {
      "id": "#primaryArtistSuffix",
      "type": "string",
      "required": false
    },
    "isLoanObject": {
      "id": "#isLoanObject",
      "type": "boolean",
      "required": false
    },
    "hasDescription": {
      "id": "#hasDescription",
      "type": "boolean",
      "required": false
    },
    "currentImage": {
      "id": "#currentImage",
      "type": "object",
      "required": false,
      "properties": {
        "CRDID": {
          "id": "#CRDID",
          "type": "integer",
          "required": false
        },
        "publicAccess": {
          "id": "#publicAccess",
          "type": "boolean",
          "required": false
        },
        "imageUrl": {
          "id": "#imageUrl",
          "type": "string",
          "required": false
        },
        "width": {
          "id": "#width",
          "type": "integer",
          "required": false
        },
        "height": {
          "id": "#height",
          "type": "integer",
          "required": false
        },
        "webWidth": {
          "id": "#webWidth",
          "type": "integer",
          "required": false
        },
        "webHeight": {
          "id": "#webHeight",
          "type": "integer",
          "required": false
        },
        "rank": {
          "id": "#rank",
          "type": "integer",
          "required": false
        },
        "primaryDisplay": {
          "id": "#primaryDisplay",
          "type": "boolean",
          "required": false
        },
        "isZoomable": {
          "id": "#isZoomable",
          "type": "boolean",
          "required": false
        }
      }
    },
    "noImageAvailable": {
      "id": "#noImageAvailable",
      "type": "boolean",
      "required": false
    },
    "isThumbnailOnly": {
      "id": "#isThumbnailOnly",
      "type": "boolean",
      "required": false
    },
    "audioCount": {
      "id": "#audioCount",
      "type": "integer",
      "required": false
    },
    "videoCount": {
      "id": "#videoCount",
      "type": "integer",
      "required": false
    },
    "numRelatedPublications": {
      "id": "#numRelatedPublications",
      "type": "integer",
      "required": false
    },
    "whoList": {
      "id": "#whoList",
      "type": "object",
      "required": false,
      "properties": {
        "who": {
          "id": "#who",
          "type": "object",
          "required": false,
          "properties": {
            "name": {
              "id": "#name",
              "type": "string",
              "required": false
            },
            "count": {
              "id": "#count",
              "type": "integer",
              "required": false
            },
            "isCurrent": {
              "id": "#isCurrent",
              "type": "boolean",
              "required": false
            },
            "url": {
              "id": "#url",
              "type": "string",
              "required": false
            }
          }
        }
      }
    },
    "whatList": {
      "id": "#whatList",
      "type": "object",
      "required": false,
      "properties": {
        "what": {
          "id": "#what",
          "type": "array",
          "required": false,
          "items": {}
        }
      }
    },
    "whereList": {
      "id": "#whereList",
      "type": "object",
      "required": false,
      "properties": {
        "where": {
          "id": "#where",
          "type": "array",
          "required": false,
          "items": {}
        }
      }
    },
    "whenList": {
      "id": "#whenList",
      "type": "object",
      "required": false,
      "properties": {
        "when": {
          "id": "#when",
          "type": "object",
          "required": false,
          "properties": {
            "name": {
              "id": "#name",
              "type": "string",
              "required": false
            },
            "count": {
              "id": "#count",
              "type": "integer",
              "required": false
            },
            "isCurrent": {
              "id": "#isCurrent",
              "type": "boolean",
              "required": false
            },
            "url": {
              "id": "#url",
              "type": "string",
              "required": false
            }
          }
        }
      }
    },
    "inTheMuseumList": {
      "id": "#inTheMuseumList",
      "type": "object",
      "required": false,
      "properties": {
        "inTheMuseum": {
          "id": "#inTheMuseum",
          "type": "object",
          "required": false,
          "properties": {
            "id": {
              "id": "#id",
              "type": "string",
              "required": false
            },
            "name": {
              "id": "#name",
              "type": "string",
              "required": false
            },
            "count": {
              "id": "#count",
              "type": "integer",
              "required": false
            },
            "isCurrent": {
              "id": "#isCurrent",
              "type": "boolean",
              "required": false
            },
            "url": {
              "id": "#url",
              "type": "string",
              "required": false
            }
          }
        }
      }
    },
    "isExhibitionArtWork": {
      "id": "#isExhibitionArtWork",
      "type": "boolean",
      "required": false
    },
    "addedToMyMet": {
      "id": "#addedToMyMet",
      "type": "boolean",
      "required": false
    },
    "CRDID": {
      "id": "#CRDID",
      "type": "integer",
      "required": false
    },
    "title": {
      "id": "#title",
      "type": "string",
      "required": false
    },
    "primaryArtist": {
      "id": "#primaryArtist",
      "type": "object",
      "required": false,
      "properties": {
        "role": {
          "id": "#role",
          "type": "string",
          "required": false
        },
        "name": {
          "id": "#name",
          "type": "string",
          "required": false
        },
        "nationality": {
          "id": "#nationality",
          "type": "string",
          "required": false
        }
      }
    },
    "galleryLink": {
      "id": "#galleryLink",
      "type": "string",
      "required": false
    },
    "primaryImageUrl": {
      "id": "#primaryImageUrl",
      "type": "string",
      "required": false
    },
    "primaryImageWidth": {
      "id": "#primaryImageWidth",
      "type": "integer",
      "required": false
    },
    "primaryImageHeight": {
      "id": "#primaryImageHeight",
      "type": "integer",
      "required": false
    },
    "url": {
      "id": "#url",
      "type": "string",
      "required": false
    },
    "xmlUrl": {
      "id": "#xmlUrl",
      "type": "string",
      "required": false
    },
    "informationBoxes": {
      "id": "#informationBoxes",
      "type": "object",
      "required": false,
      "properties": {
        "informationBox": {
          "id": "#informationBox",
          "type": "array",
          "required": false,
          "items": {}
        }
      }
    },
    "enlarge": {
      "id": "#enlarge",
      "type": "boolean",
      "required": false
    },
    "searchPageUrl": {
      "id": "#searchPageUrl",
      "type": "string",
      "required": false
    },
    "hasSearchSet": {
      "id": "#hasSearchSet",
      "type": "boolean",
      "required": false
    },
    "searchBackText": {
      "id": "#searchBackText",
      "type": "string",
      "required": false
    },
    "searchBackUrl": {
      "id": "#searchBackUrl",
      "type": "string",
      "required": false
    },
    "searchItemNo": {
      "id": "#searchItemNo",
      "type": "integer",
      "required": false
    },
    "searchTotalItems": {
      "id": "#searchTotalItems",
      "type": "integer",
      "required": false
    },
    "hasRelatedContent": {
      "id": "#hasRelatedContent",
      "type": "boolean",
      "required": false
    },
    "relatedArtworkLinkCount": {
      "id": "#relatedArtworkLinkCount",
      "type": "integer",
      "required": false
    },
    "relatedItemLinkCount": {
      "id": "#relatedItemLinkCount",
      "type": "integer",
      "required": false
    },
    "relatedToahLinkCount": {
      "id": "#relatedToahLinkCount",
      "type": "integer",
      "required": false
    },
    "relatedTabs": {
      "id": "#relatedTabs",
      "type": "object",
      "required": false,
      "properties": {
        "string": {
          "id": "#string",
          "type": "array",
          "required": false,
          "items": {}
        }
      }
    },
    "relatedItemList": {
      "id": "#relatedItemList",
      "type": "object",
      "required": false,
      "properties": {
        "relatedItem": {
          "id": "#relatedItem",
          "type": "array",
          "required": false,
          "items": {}
        }
      }
    },
    "relatedToahLinkList": {
      "id": "#relatedToahLinkList",
      "type": "object",
      "required": false,
      "properties": {
        "relatedToahLink": {
          "id": "#relatedToahLink",
          "type": "object",
          "required": false,
          "properties": {
            "id": {
              "id": "#id",
              "type": "string",
              "required": false
            },
            "title": {
              "id": "#title",
              "type": "string",
              "required": false
            },
            "url": {
              "id": "#url",
              "type": "string",
              "required": false
            }
          }
        }
      }
    },
    "relatedArtworkList": {
      "id": "#relatedArtworkList",
      "type": "object",
      "required": false,
      "properties": {
        "relatedArtwork": {
          "id": "#relatedArtwork",
          "type": "array",
          "required": false,
          "items": {}
        }
      }
    },
    "showEmbeddedVideo": {
      "id": "#showEmbeddedVideo",
      "type": "boolean",
      "required": false
    },
    "showEmbeddedAudio": {
      "id": "#showEmbeddedAudio",
      "type": "boolean",
      "required": false
    },
    "hasMedia": {
      "id": "#hasMedia",
      "type": "boolean",
      "required": false
    }
  }
}
jedahan commented 10 years ago

Did solution b, schema stuff may come later as a separate issue. Let me know if this fixes things. Now any List should always be an array of objects.

eneim commented 10 years ago

Thanks. I test with /random and it works like a charm.

eneim commented 10 years ago

Hi, so sorry for re-call this issue again. I have tested again by using an Android project, and it remains at least one property that is still not well shaped. Please check out "informationBoxes" and its related term.

Thanks in advance.

jedahan commented 10 years ago

could you find an example where informationBoxes is an array of objects so I can test against that? Thanks!

eneim commented 10 years ago

I have 2 different request of /random as shown in the following dropbox links:

https://dl.dropboxusercontent.com/u/13792394/Documents/test_1.json

https://dl.dropboxusercontent.com/u/13792394/Documents/test_2.json

I tried to get 2 cases in which the "informationBoxes" property have different behavior. Please check out.

I also try to use your json schema and this link - http://www.jsonschema2pojo.org/ - to generate java classes and try again but it works even worse I suppose (maybe I'm doing it wrong).

jedahan commented 10 years ago

how are things now?

eneim commented 10 years ago

thanks for your response, and sorry for this late reply. I could see a bit change in the structure. the only "tag" that shows different behaviours in the single-item-case and multi-item-case is "informationBox" which will be that "{" parentheses in the single-item case (while others show the "[" - the expected parentheses - for even single-item case). i tested on the /random page.

best regards.