magda-io / magda

A federated, open-source data catalog for all your big data and small data
https://magda.io
Apache License 2.0
494 stars 93 forks source link

registry/records API return maximum 1000 results #1432

Open jevy-wangfei opened 6 years ago

jevy-wangfei commented 6 years ago

Problem description

After harvested more than 1000 organisations(publishers), this API https://knowledgenet.co/api/v0/registry/records?aspect=organization-details&limit=20000 returns maximum 1000 organisation records no matter how the limit param was set.

Problem reproduction steps

Copy and Past the URL to an explorer: https://knowledgenet.co/api/v0/registry/records?aspect=organization-details&limit=20000

Screenshot / Design / File reference

image

AlexGilleran commented 6 years ago

The 1000 limit is intentional, but we should be returning an error rather than silently trimming the result set to 1000.

jevy-wangfei commented 6 years ago

We have a function of exploring publishers by a data source https://knowledgenet.co/organisation. Currently this function was being implemented by loading all publishers with their data source information, and aggregating the data at front side. Does Magda has some kinds of APIs to help build this function?

t83714 commented 6 years ago

Probably this new search API will help:

api/v0/search/organisations?query=*&start=0&limit=20

Sample response:

{
    "query": "*",
    "hitCount": 661,
    "organisations": [
        {
            "acronym": "DSSAUDFSISS",
            "name": ": DFSI Spatial Services, a unit of Department Finance, Services and Innovation(DFSI), Spatial Services",
            "email": "sds.services@lpi.nsw.gov.au",
            "identifier": "org-sdinsw-baf7bbbcdfeabacb0a17ab0eb7ff63f784f08e3d1839680f64ca16e7785bd3c6",
            "addrState": "Bathurst",
            "datasetCount": 1,
            "addrStreet": "DFSI, Spatial Services PO BOX 143",
            "addrPostCode": "2795",
            "phone": "02 6332 8200"
        },
        {
            "acronym": "ASD",
            "website": "http://www.abs.gov.au/",
            "name": "ABS (SA Data)",
            "email": "https://www4.abs.gov.au/web/survey.nsf/inquiryform/",
            "identifier": "org-sa-774dc75c-cfce-4040-bd52-d3893dc71090",
            "description": "Australian Bureau of Statistics - SA Data Released\r\n\r\n",
            "datasetCount": 19,
            "imageUrl": "https://data.sa.gov.au/data/uploads/group/2017-01-11-044253.549195abslogowh2.gif",
            "phone": "1300 135 070 "
        },
        {
            "acronym": "AGS",
            "website": "http://www.abs.gov.au/geography",
            "name": "ABS Geospatial Solutions",
            "email": "geography@abs.gov.au",
            "identifier": "org-dga-760c24b1-3c3d-4ccb-8196-41530fcdebd5",
            "datasetCount": 34,
            "imageUrl": "http://www.abs.gov.au/ausstats/wmdata.nsf/activeotherresource/ABS_Logo_333/$File/ABS_Logo_333.svg",
            "phone": "1300 135 070"
        },
        {
            "acronym": "A",
            "website": "https://web.acma.gov.au/pls/radcom/",
            "name": "ACMA",
            "email": "info@acma.gov.au",
            "identifier": "org-listtas-ACMA",
            "addrState": "ACT",
            "datasetCount": 1,
            "addrSuburb": "Belconnen",
            "addrStreet": "Red Building",
            "addrPostCode": "2617",
            "phone": "02 6219 5555"
        },
        {
            "acronym": "AA",
            "name": "AGSO-Geoscience Australia",
            "identifier": "org-ga-AGSO-Geoscience Australia",
            "datasetCount": 19,
            "addrSuburb": "Canberra"
        },
        {
            "acronym": "AA",
            "name": "AGSO-Geoscience Australia",
            "identifier": "org-aodn-AGSO-Geoscience Australia",
            "datasetCount": 3,
            "addrSuburb": "Canberra"
        },
        {
            "acronym": "AE",
            "name": "ANU E-Press",
            "identifier": "org-ga-ANU E-Press",
            "datasetCount": 1,
            "addrSuburb": "Canberra"
        },
        {
            "identifier": "org-aodn-ANZLIC the Spatial Information Council",
            "name": "ANZLIC the Spatial Information Council",
            "acronym": "ASIC",
            "datasetCount": 24
        },
        {
            "identifier": "org-sdinsw-ANZLIC the Spatial Information Council",
            "name": "ANZLIC the Spatial Information Council",
            "acronym": "ASIC",
            "datasetCount": 688
        },
        {
            "identifier": "org-marlin-ANZLIC the Spatial Information Council",
            "name": "ANZLIC the Spatial Information Council",
            "acronym": "ASIC",
            "datasetCount": 2
        },
        {
            "identifier": "org-dga-53a24fb3-52e7-4155-8c70-20e68cc1dde7",
            "name": "ARC Centre of Excellence for Coral Reef Studies, James Cook University",
            "acronym": "ACEFCRSJCU",
            "datasetCount": 8
        },
        {
            "acronym": "AACC",
            "name": "Aboriginal Affairs Coordinating Committee",
            "identifier": "org-wa-7611a523-62c3-4c13-b3e1-499ef0bcf0dc",
            "description": "This Aboriginal Affairs Coordinating Committee (AACC) Data Warehouse is for the purpose of data collection, sharing and analysis in relation to Aboriginal affairs, including the Council of Australian Governments' Closing the Gap indicators.\r\n\r\nThe AACC Data Warehouse is administered by the Department of the Premier and Cabinet.\r\n\r\nAbout the Aboriginal Affairs Coordinating Committee (AACC) \r\n----------------------------------------------------------\r\nThe AACC is legislated for under section 19 of the Aboriginal Affairs Planning Authority Act 1972 (the Act).  Under the Act \"the function of the Committee is to coordinate effectively the activities of all persons and bodies, corporate or otherwise, providing or proposing to provide service and assistance in relation to persons of Aboriginal descent\".\r\n\r\nThe AACC is the main coordinating body for Aboriginal affairs, programs, services and policies. The AACC and its member agencies form a key support role to the Community Safety and Family Support Cabinet Sub-Committee through the provision of high-level strategic policy development and advice. \r\n",
            "datasetCount": 5,
            "imageUrl": "https://catalogue.data.wa.gov.au/uploads/group/2016-02-05-003111.561019govOfWATextBlack562x541.jpg"
        },
        {
            "acronym": "AAV",
            "name": "Aboriginal Affairs Victoria",
            "identifier": "org-vic-ee15ed9f-c17e-443c-ad9e-6ff35dfd1e2d",
            "description": "The Office of Aboriginal Affairs Victoria (OAAV) provides advice to the Victorian Government on Aboriginal policy and planning, and delivers key programs. OAAV works in partnership with Aboriginal communities, and government departments and agencies to promote knowledge, leadership and understanding about Victoria's Aboriginal people.",
            "datasetCount": 7
        },
        {
            "identifier": "org-qld-51750d16-3828-4e13-a23c-6ef0c8a5eb13",
            "name": "Aboriginal and Torres Strait Islander Partnerships",
            "acronym": "ATSIP",
            "datasetCount": 30
        },
        {
            "acronym": "AFF",
            "website": "adelaidefilmfestival.org",
            "name": "Adelaide Film Festival",
            "identifier": "org-sa-5daabb20-e379-44e2-a913-cdf8821232ec",
            "description": "Adelaide Film Festival (ADL Film Fest) is an eleven-day celebration and exploration of Australian and international screen culture with a unique program of screenings, forums and special events. The event has rapidly established itself as one of the boldest and most innovative in the country and has made a name for itself internationally as a platform for exciting new talent in the Australian industry. Originally presented biennially in March, since 2013 the Adelaide Film Festival has been presented in October.",
            "datasetCount": 1,
            "imageUrl": "https://data.sa.gov.au/data/uploads/group/2017-09-27-041703.206319CUsersjacksm01DesktopLOGO---ADL-Film-Fest.jpg"
        },
        {
            "identifier": "org-qld-93f12747-9561-4980-bcfd-46b9e5cc51c7",
            "name": "Agriculture and Fisheries",
            "acronym": "AF",
            "datasetCount": 139
        },
        {
            "acronym": "ASC",
            "name": "Alpine Shire Council",
            "identifier": "org-dga-23f904cc-9cd7-40d0-a494-c64a062bf0f7",
            "description": "Alpine Shire is a local government area in Victoria, Australia in the north-east of the state. It includes the towns of Bright, Mount Beauty and Myrtleford. It has an area of 4,885 square kilometres.",
            "datasetCount": 4,
            "imageUrl": "http://maps.alpineshire.vic.gov.au/logos/5124-Alpine-logo-(web_180px).png"
        },
        {
            "acronym": "AV",
            "name": "Ambulance Victoria",
            "identifier": "org-vic-8467d326-740f-4366-859b-5e354749403f",
            "description": "A little information about my organization...",
            "datasetCount": 11
        },
        {
            "acronym": "AGU",
            "name": "American Geophysical Union",
            "identifier": "org-ga-American Geophysical Union",
            "datasetCount": 3,
            "addrSuburb": "$metadataRecord.getPublisherCity()"
        },
        {
            "acronym": "ACECRC",
            "name": "Antarctic Climate and Ecosystems Cooperative Research Centre",
            "email": "enquiries@acecrc.org.au",
            "identifier": "org-listtas-Antarctic Climate and Ecosystems Cooperative Research Centre",
            "addrState": "TAS",
            "datasetCount": 1,
            "addrSuburb": "HOBART",
            "addrStreet": "20 Castray Esplanade",
            "addrPostCode": "7000",
            "phone": "+61 (0)3 6226 7888"
        }
    ]
}
jevy-wangfei commented 6 years ago

Thank @t83714 , this API is good for querying organisations and its dataset number. I am wondering does magda has the API of querying data sources with the number of its harvested organisations ?

t83714 commented 6 years ago

@jevy-wangfei We currently don't have an API for querying data sources (We have a registry API that returns a list of sources but it won't return any organization details). Do you think the following changes will provide the functionality you need?

jevy-wangfei commented 6 years ago

@t83714 That would be very useful.

Could you please make this API allowing user to collecting all of the records (no limit of 1000 records) on a single API call? Or in another strategy, can you add a page param to allow user searching by pagination: api/v0/registry/records?aspect=organization-details&limit=100&page=100 ?

As discussed above, this issue came from the registry records API limitation of returned records. I hope this API could also provide a way of fetching all records in a similar as the new organisation searching API.

Thanks you very much Jacky.

t83714 commented 6 years ago

@jevy-wangfei No problem~ I will create a new issue for the changes to organization API and let you once I get a chance to look at it. Regarding collecting all data in one request, it probably won't be a good idea (in term of reliability) and practical. A better idea of retrieving all records would probably be page by page. Both two APIs supports start & limit parameters for pagination (for registry API better use pageToken. You can use nextPageToken returned from response for next pageToken ). e.g. https://knowledgenet.co/api/v0/registry/records?aspect=organization-details&pageToken=29860&limit=2