mrhappyasthma / IsThisStockGood

A tool for evaluating companies using the Rule #1 investing principles.
http://www.isthisstockgood.com
22 stars 10 forks source link

Add simple daily CI tests to verify data scraping dependencies #59

Open mrhappyasthma opened 1 year ago

mrhappyasthma commented 1 year ago

This project relies on a bunch of data sources. Some combination of stockrow, MSN Money, Yahoo Finance, etc.

If any of these dependencies changes their data format or removes a page that our scraping logic relies on, things may silently fail.

To protect against this, it would be a good idea to set up some (albeit fragile) tests to verify that the basic fetching logic for each data source is working. These tests can run periodically (e.g. cron job).

While this isn't a typically good practice for software testing, since we rely on scraping these (potentially unstable) data sources, I believe this will be the best early-warning system we can make.

kocielnik commented 2 months ago

I wonder if reporting an issue here, on GitHub (API: Create an issue) every time a data source fails during a query would be sufficient.

This way we would avoid having to deploy a separate service for just the purpose of monitoring.

Workflow variants - below.

Variant 1 - no issue exists for the given provider:

  1. User runs a typical query.
  2. The query fails - response matches neither a correct one, nor one for an invalid ticker.
  3. Application shows a spinner with a comment "Reporting the error".
  4. A new issue is created under this project, with title: ": Incorrect response format".
    • For today, only one source is functional, hence the title would be "MSN Money: Incorrect response format".

Variant 2 - an issue already exists for the given provider:

  1. User runs a typical query.
  2. The query fails - response matches neither a correct one, nor one for an invalid ticker.
  3. Application shows a spinner with a comment "Reporting the error".
  4. Application finds an issue for that provider already exists.
  5. Application shows a message: "Issue already reported".

Variant 3 (optional) - an issue already exists for the given provider, and this time the response is CORRECT:

  1. User runs a typical query.
  2. The query succeeds.
  3. Application finds an issue for that provider already exists.
  4. Application marks the issue as "Resolved" with a note: "Got a correct response."

Cost/benefit analysis

Pros:

  1. Everyone interested in the project sees the issue, and interested parties can subscribe to email notifications for these too.

Cons/costs:

  1. Possible time overhead of checking for existing tickets on GitHub.