Move search to ElasticSearch - Githubissues

openva / richmondsunlight.com

The Richmond Sunlight website.

https://www.richmondsunlight.com/

MIT License

12 stars 3 forks source link

Move search to ElasticSearch #37

Open waldoj opened 10 years ago

waldoj commented 10 years ago

Now that ElasticSearch is installed, index legislation with it, rather than Sphinx.

Done right, indexing legislation means exporting legislation as JSON. This should be as simple as generating a list of new legislation, making a request to the API for each one of those, and submitting that JSON to ElasticSearch.

[x] get a list of all bills
[x] connect to the API for each one
[x] output the resulting JSON to the filesystem
[x] set up an Elasticsearch index
[x] have Elasticsearch index the JSON files
[x] set up a script to automatically index the files
[x] move Elasticsearch to a different server
[ ] only export the files every 24 hours
[ ] set up an API endpoint for search
[ ] create a new search page that queries against that endpoint

waldoj commented 7 years ago

Use Elasticsearch's bulk importer. jq may be a good tool to get the JSON reformatted.

waldoj commented 7 years ago

It doesn't look like the provided document IDs (in the JSON header material) is being used by Elasticsearch. Elasticseach is just using random identifiers (e.g., IQSkQ3btTI6L8KEvdnnlnA). That's going to make updates impossible—we'll just have a ever-growing index, full of duplicates.

waldoj commented 7 years ago

I moved to using the bills' database IDs as the document IDs, and that fixed things. I'm not sure what was problematic about using e.g. 2019-hb123 as the document ID, but it wasn't working.

waldoj commented 7 years ago

I've got Elasticsearch configured on another server (to avoid taxing this one), with a rule opening up the firewall, but Elasticsearch is rejecting it. Figure out why, and ensure that the new firewall rule is loaded on reboot.

waldoj commented 7 years ago

Server moved successfully.