openeduhub / oeh-search-etl

The Backend includes all data for the ETL process (Scrapy, Postgres, Elasticsearch)
7 stars 9 forks source link

feat: support edu-sharing v9.x API (+ dependency updates) #109

Closed Criamos closed 2 months ago

Criamos commented 2 months ago

This PR includes the following changes:

Attention: If you encounter pydantic ValidationErrors while crawling

The Python Generator of openapi-generator-cli uses pydantic for its data models and validation of API calls, which allows us to catch errors before they end up in the edu-sharing back-end.

When running crawlers you might encounter pydantic-related Validation Errors that you haven't seen before. These new error messages are super helpful when debugging and allow you to catch small oversights (like missing type-casts) before the data gets saved to the back-end.

Compared to the previous API client, the boot-up time of the crawlers is expected to be slower due to the additional validation steps at warm-up. You will notice a delay when starting an individual scrapy.Spider before the actual crawl process begins. (We hope that future pydantic versions will hopefully reduce the initial boot-up time with additional optimizations.)

Documentation

Since these changes required extensive research beforehand, the oeh-search-etl GitHub Wiki received two additional chapters to make future project maintenance a little easier to handle:

gitguardian[bot] commented 2 months ago

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | | | -------------- | ------------------ | ------------------------------ | ---------------- | --------------- | -------------------- | | [4017360](https://dashboard.gitguardian.com/workspace/271601/incidents/4017360?occurrence=163141852) | Triggered | Generic Password | 32534e24ff927fec020f18b6eb79a1b8be1a3ca2 | edu_sharing_client/api/sharing_v1_api.py | [View secret](https://github.com/openeduhub/oeh-search-etl/commit/32534e24ff927fec020f18b6eb79a1b8be1a3ca2#diff-b4c42334c46f77b195ef4b1bf5b4d546d45b69615cbb051813a94b4bbcc0d6b5L87) |
🛠 Guidelines to remediate hardcoded secrets
1. Understand the implications of revoking this secret by investigating where it is used in your code. 2. Replace and store your secret safely. [Learn here](https://blog.gitguardian.com/secrets-api-management?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment) the best practices. 3. Revoke and [rotate this secret](https://docs.gitguardian.com/secrets-detection/secrets-detection-engine/detectors/generics/generic_password#revoke-the-secret?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment). 4. If possible, [rewrite git history](https://blog.gitguardian.com/rewriting-git-history-cheatsheet?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment). Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data. To avoid such incidents in the future consider - following these [best practices](https://blog.gitguardian.com/secrets-api-management/?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment) for managing and storing secrets including API keys and other credentials - install [secret detection on pre-commit](https://docs.gitguardian.com/ggshield-docs/integrations/git-hooks/pre-commit?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment) to catch secret before it leaves your machine and ease remediation.

🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.