thegraphnetwork-literev / es-journals

BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

fix: Implement Unique Document ID for Elasticsearch Indexing to Prevent Duplication #2

Closed esloch closed 7 months ago

esloch commented 7 months ago

Pull Request Description

This Pull Request introduces a unique document ID generation mechanism for the Elasticsearch indexing process used within the LiteRev platform. The purpose of this enhancement is to prevent data duplication during the daily indexing of new data from MedRxiv and BioRxiv servers. By ensuring each document indexed into Elasticsearch is unique, we maintain data integrity and improve the platform's overall search efficiency.

Pull Request Checklists

This PR is a:

About this PR:

Author's Checklist:

Additional Implementation

1. Secure Password Management for Elasticsearch

Introduced a script to automatically reset and update the Elasticsearch 'elastic' user password, enhancing security by automating credential management. This script is executed as part of the container startup process, ensuring that Elasticsearch credentials are securely managed and updated as needed.

Reviewer's Checklist

Please use the following checklist for reviewing this PR:

## Reviewer's Checklist

- [ ] I managed to reproduce the problem locally from the `main` branch.
- [ ] I managed to test the new changes locally.
- [ ] I confirm that the issues mentioned were fixed/resolved.