skyl / corpora

Corpora is a self-building corpus that can help build other arbitrary corpora
GNU Affero General Public License v3.0
2 stars 0 forks source link

feat(.corpora/.id): working on the same corpus with different machines and databases #42

Closed skyl closed 1 week ago

skyl commented 1 week ago

PR Type

enhancement, configuration changes


Description


Changes walkthrough πŸ“

Relevant files
Enhancement
corpus.py
Improve corpus ID handling and error management                   

py/packages/corpora_cli/commands/corpus.py
  • Added import for os.
  • Ensured .corpora directory exists before saving ID.
  • Changed corpus ID saving mechanism to .corpora/.id.
  • Added error handling for ApiException.
  • +9/-3     
    config.py
    Load corpus ID from file and adjust config loading             

    py/packages/corpora_cli/config.py
  • Added check for existence of .corpora/.id.
  • Loaded corpus ID from .corpora/.id if it exists.
  • Commented out environment variable substitution.
  • +9/-2     
    Configuration changes
    settings.py
    Update allowed hosts in Django settings                                   

    py/packages/corpora_proj/settings.py - Added `localhost` to `ALLOWED_HOSTS`.
    +1/-0     
    .corpora.yaml
    Simplify and clean up corpora configuration                           

    .corpora.yaml
  • Removed id and auth sections.
  • Simplified configuration by removing subcorpora and CLI options.
  • +3/-56   
    Documentation
    devcontainer.json
    Add note on autopep8 uninstallation in devcontainer           

    .devcontainer/devcontainer.json - Added comment about uninstalling `autopep8` in devcontainer.
    +2/-1     

    πŸ’‘ PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    github-actions[bot] commented 1 week ago

    PR Reviewer Guide πŸ”

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 πŸ”΅πŸ”΅πŸ”΅βšͺβšͺ
    πŸ§ͺ No relevant tests
    πŸ”’ Security concerns

    Sensitive information exposure:
    The `SECRET_KEY` in `py/packages/corpora_proj/settings.py` is hardcoded with a default value. This should be managed through environment variables to prevent accidental exposure in production environments.
    ⚑ Recommended focus areas for review

    Code Smell
    The use of hardcoded file paths such as `.corpora/.id` can lead to issues if the directory structure changes. Consider using a configuration or environment variable to manage file paths. Code Smell
    The `exists` function is imported from `genericpath`, which is not necessary as `os.path.exists` can be used directly. This can lead to confusion and should be streamlined.
    github-actions[bot] commented 1 week ago

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Security
    Set DEBUG to False in production to prevent exposure of sensitive information ___ **Set DEBUG to False for production environments to avoid exposing sensitive
    information.** [py/packages/corpora_proj/settings.py [25]](https://github.com/skyl/corpora/pull/42/files#diff-bb6dcc14e8988871411f9b97731b6de4cc309ba557ae8a664a5e0ff5b64c564cR25-R25) ```diff -DEBUG = True +DEBUG = False ```
    Suggestion importance[1-10]: 9 Why: Setting `DEBUG` to `False` in production is crucial for security, as it prevents the exposure of sensitive information and reduces the risk of security vulnerabilities.
    9
    Possible issue
    Add error handling for potential I/O errors during file write operations ___ **Add error handling for the file write operation to handle potential I/O errors when
    saving the corpus ID.** [py/packages/corpora_cli/commands/corpus.py [48-49]](https://github.com/skyl/corpora/pull/42/files#diff-c04094e69a69881444f8b97d9fee7e0d29683f5e062cb4d2cf4530bc08daedf7R48-R49) ```diff -with open(".corpora/.id", "w") as f: - f.write(res.id) +try: + with open(".corpora/.id", "w") as f: + f.write(res.id) +except IOError as e: + c.console.print(f"Failed to write corpus ID: {e}", style="red") ```
    Suggestion importance[1-10]: 8 Why: Adding error handling for I/O operations is a good practice to prevent the application from crashing due to unforeseen file system issues, enhancing the robustness of the code.
    8
    Best practice
    Use os.path.exists for file existence checks to maintain consistency with other os module usages ___ **Consider using os.path.exists instead of importing exists from genericpath for
    checking file existence to maintain consistency with other os module usages.** [py/packages/corpora_cli/config.py [26-28]](https://github.com/skyl/corpora/pull/42/files#diff-d5d4ce34a5030133a9b563f560c601dd7d497a186be5bbf0c12a248ea466c948R26-R28) ```diff -if exists(ID_FILE_PATH): +if os.path.exists(ID_FILE_PATH): with open(ID_FILE_PATH, "r") as file: config["id"] = file.read().strip() ```
    Suggestion importance[1-10]: 7 Why: Using `os.path.exists` instead of importing `exists` from `genericpath` improves code consistency and readability, aligning with the usage of the `os` module elsewhere in the code.
    7