sjdonado / openchargemap-sync

GraphQL + MongoDB + RabbitMQ + Docker - no frameworks :)
0 stars 0 forks source link

[ADR-001]: Services setup #4

Open sjdonado opened 1 year ago

sjdonado commented 1 year ago

Context

There are two use cases:

  1. Pulling and storing the data (fetch core reference data + messages published to the queue + messages processed)
  2. Fetching data, /GraphQl endpoint (Apollo Server)

It is proposed to have two services -> to avoid having a single point of failure Plus, the nature of the use cases allows to perform isolate and concurrent operations.

sequenceDiagram
    Note over Scraper: Scraper pulls countries and operators from openchargemap.org
    Scraper->>Scraper: 
    Scraper->>RabbitMQ: Publish RabbitMQ messages
    loop for each message with one operatorid
        RabbitMQ->>Scraper: Consume RabbitMQ message
        Note over Scraper: Fetch operatorInfo, statusType, addressInfo, Connections from openchargemap.org
        Scraper->>Scraper: 
        Scraper->>MongoDB: Store data in database
    end
    Note over GraphQLService: GraphQL queries
    GraphQLService->>MongoDB: Fetch data from database

Approach

Folder structure: MVC design pattern, responsibilities separated by layers. Plus, tasks directory for the publishers/consumers.

Code style: ESLint + prettier.

Database Models: For the scope of the requirements, it is discarded using an ORM. It will be a single collection, where each document is a "Snapshot" of the data extracted from OpenChargeMap.

Message queue: The main idea is to execute sequential requests to get small chunks of data from OpenChargeMap. Each message posted will contain an operatorid, then when consuming the results they will be pushed to an array in the last poiListSnapshot document.

Fault tolerance: Messages with errors will be sent to a dead letter queue, a document in the collection poiListSnapshots with status completed: false will not be publicly available.

POI list data deleted: If a site is removed from OpenChargeMap, our system will reflect the changes when the last snapshot become public. This means that, at any time, it will be at most two documents in the database: one with the last snapshot (public) and another with the in-progress data obtained by the extraction service.

Consequences

References

sjdonado commented 1 year ago

The approach sending operatorId filtered by countries is not possible. Although the documentation shows that it is a valid property: image

It is null for almost all of them:

image

I'll try with countryid + dataproviderid.

sjdonado commented 1 year ago

Future work

  1. Database cleanup: remove old poiListSnapshots + pois documents.
  2. Configure retry/recovery logic -> consume DLQ.
  3. CI pipelines setup -> run tests + static analysis.
  4. Increase mongo transactions cache size, to fine-tuning:
    export const POI_LIST_CHUNK_SIZE = 1000;
    export const POI_LIST_MAX_RESULTS = 5000;
  5. Database optimizations: Indexing.
  6. Apollo Server queries caching (redis configuration done).
  7. Monitoring: newrelic or datadog setup + logger levels.
  8. Performance: replace tsc with swc.
  9. Decouple publisher from consumer in scraper-service. Publisher can be a separate lambda function, and the consumer a worker service.
  10. Tests improvements: scraper-service e2e test cases, refactor mocks in openChargeMapConsumer.spec.ts (readability), applyPagination and getPageInfo missing unit tests, graphql-subgraph warning "Too many nested callbacks (5)".