mediacloud / news-search-api

Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
https://mediacloud.org
GNU Affero General Public License v3.0
1 stars 3 forks source link

news-search-api development/test environment? #27

Closed philbudne closed 3 months ago

philbudne commented 9 months ago

news-search-api containers are deployed along side story-indexer code, but development and test of news-search-api may best be done against the production ES servers (which do not run under docker).

Allow debug/test/demo container deployment of news-search API code separate from full story-indexer stack (pointing at production ES)?

Make it possible to deploy a "api-demo" story-indexer stack (with just news-search-api containers??) with an arbitrary news-search-api image at a "well known" host/port location so that development/test versions of the website can point to it?

rahulbot commented 9 months ago

That sounds great for integration testing. What do you think about unit testing priority in comparison? For instance, it should be simple to use Docker files to create an ElasticSearch image with well-known test data and run unit test against the news-search-api connected to that. On another project we do that against Postgres via a Github automation for CI. It would let us know that the results coming out of this API server are what we expect them to be.

philbudne commented 9 months ago

What do you think about unit testing priority in comparison? For instance, it should be simple to use Docker files to create an ElasticSearch image with well-known test data and run unit test against the news-search-api connected to that.

Yes, it should be possible to have a "canned" ES database on a Docker volume for running search API unit tests against.

philbudne commented 9 months ago

The concern that I opened the ticket for is: How do we make new versions of the API code/container available for:

  1. development of the website django back end
  2. testing with the mcweb staging environment
rahulbot commented 9 months ago

Yes - a good point. In the short term it sounds like a new news-search-api means a new staging story-indexer release... is that right? Or is the better short-term solution to deploy news-search-api release candidates to the same server as web-staging and have it connect to that?

philbudne commented 9 months ago

Or is the better short-term solution to deploy news-search-api release candidates to the same server as web-staging and have it connect to that?

I think that's closer to what we want for development/test of news-search-API:

news-search-api containers that:

  1. connect to the production DB
  2. are always running
  3. are pointed to by a user-visible mcweb deployment

We might need multiple shades/flavors of this so that mcweb and news-search-api developers can test against each other's code (while looking at production DB data).

Some different shades/flavors might be:

  1. news-search-api containers for testing/development of new features by n-s-a and mcweb developers
  2. almost-ready for production release testing
  3. "preview" testing by researchers

Merging something into the story-indexer staging branch carries the notions that:

  1. the code has been tested in a developer environment and works (otherwise it should not have been merged to main)

  2. it might move (barring a red light being raised that the story-indexer staging environment is broken and on no account should it be merged to production) at any moment to production, and that that won't cause problems.

rahulbot commented 9 months ago

Ok, well the current build has some unit tests that show basic functionality against test data. How do we automate the integration testing of that to inspire release confidence? I think the (overly-complicated) call stack here is:

(a) API user -> (b) web-search-django -> (c) mc_providers -> (d) wayback-news-search -> (e) news-search-api -> (f) test Elasticsearch

If that is right, then we can work backwards from the right to the left with integration tests. For instance, now we have a unit test that verified (e)->(f) (news-search-api -> Elasticsearch). We could add one via the wayback-news-search GH workflows that goes one step further, using a news-search-api:latest docker image and a test against the same static Elasticsearch there to do (d)->(e)->(f) (wayback-news-search -> news-search-api -> test Elasticsearch). That would inspire more confidence because there is (c)->(d) (mcproviders -> wayback-news-search) that verifies production data searching works (for both wayback-machine's archive and the new story-indexer one).

Did that make any sense? Or is it too much integration testing against a test ES database? I guess the real concern is about working against real data and how to do that. Anyway I'll push a tag of the current news-search-api to support trying any of the ideas you mentioned.

philbudne commented 9 months ago

Thinking about it, development & test of news-search-api need not ever use anything but the production ES database (available without docker networking), so perhaps we should go back to deploying news-search-api with its own docker-compose.yml file?

mcweb-staging (running on steinam) could point to a staging deployment of news-search-api also running on steinam.

rahulbot commented 9 months ago

Certainly it doesn't have a lot of coupling touchpoints to the system as is. However it is code that is unlikely to change much after the another week or two.

I think that would make our system diagram look for more like (see the new area highlighted in red).

MC_system_diagram_2023__1__pdf
rahulbot commented 3 months ago

The staging/production refactoring of this out of story-indexer is done. Closing even though this also includes other ideas, since some relate to a separate testing and monitoring set of tasks.