tsdataclinic / scout

Scout is a data discovery tool to explore open data portals worldwide.
https://scout.tsdataclinic.com
Apache License 2.0
33 stars 12 forks source link

How to addon custom data portals using scout #326

Closed akshat-crypto closed 1 year ago

akshat-crypto commented 1 year ago

I am currently using the dev version for testing the scout application and there are a few more open data portals availables i wanted to add them with my custom scout application deployment, Is there any guide or any other procedure or article i can follow to update. It would be of great help if there is any proper documentation regarding this will be updated in the official repository

Thanks

jps327 commented 1 year ago

Hi @akshat-crypto thanks for reaching out!

What open data portals are you curious about adding to Scout?

Currently we only support portals that are available via the Socrata Discovery API. Are you trying to add a portal available through this API? You can see the full list of portals they have available here:

If the open data portal you're interested in is supported by their API then just add the portals you're interested in to the Portal Details Lookup so that it can be considered a valid portal when you pull in new data for Scout. Also, if you're using the yarn seed-database-dev command to seed your databases you will only be pulling in a limited set of portals (so that it can run faster on a local computer), so remember to override the PORTAL_OVERRIDE_LIST argument to include the portal you're interested in.

If you're trying to integrate some other open data portal that's not supported by the Socrata API, then it's not so easy anymore. It is in our roadmap to add open data from CKAN but this is still in the works and not supported yet. It's still a long-term goal to add a plugin framework to make it very easy for people to connect to other data sources. In the meantime, if you wanted to add your own open data portal, then you'll need to heavily change the Portal Sync service. The entry point is the onceAtStartup function, then we call refreshPortalList to pull datasets from the portals and update the postgres db, and then we call searchService.refreshIndex to take the data from postgres and update Elasticsearch. The code is currently pretty tightly coupled to Socrata's data model though so it won't be straightforward to modify this to support non-Socrata APIs.