skyl / corpora

Corpora is a self-building corpus that can help build other arbitrary corpora
GNU Affero General Public License v3.0
2 stars 0 forks source link

feat(api): first bit of ninja API, change to UUIDs, ninja async test pattern #6

Closed skyl closed 3 weeks ago

skyl commented 3 weeks ago

PR Type

Enhancement, Tests


Description


Changes walkthrough ๐Ÿ“

Relevant files
Enhancement
7 files
api.py
Add asynchronous API endpoints for Corpus and File management

py/packages/corpora/api.py
  • Introduced new API endpoints for creating and retrieving Corpus and
    File objects.
  • Utilized NinjaAPI for asynchronous API operations.
  • Implemented UUID-based identification for resources.
  • +43/-0   
    admin.py
    Update admin fields to use ID instead of UUID                       

    py/packages/corpora/admin.py
  • Replaced uuid with id for Corpus model fields in admin display and
    search.
  • +4/-4     
    decorators.py
    Implement async decorator for handling not found exceptions

    py/packages/corpora/lib/dj/decorators.py
  • Created a decorator to handle ObjectDoesNotExist exceptions and raise
    Http404 in async views.
  • +16/-0   
    files.py
    Add utility for calculating MD5 checksums                               

    py/packages/corpora/lib/files.py - Implemented a utility function to calculate MD5 checksums.
    +6/-0     
    0002_initial.py
    Update migration to use ID for primary keys                           

    py/packages/corpora/migrations/0002_initial.py - Updated migration to use `id` instead of `uuid` for primary keys.
    +8/-10   
    models.py
    Modify models to use ID as primary key                                     

    py/packages/corpora/models.py
  • Changed primary key fields from uuid to id for Corpus, File, and Split
    models.
  • +5/-10   
    schema.py
    Define schemas for API request and response                           

    py/packages/corpora/schema.py
  • Defined schemas for Corpus and File API requests and responses.
  • +34/-0   
    Tests
    4 files
    conftest.py
    Add pytest fixture for async NinjaAPI client                         

    py/packages/corpora/conftest.py
  • Added a pytest fixture for setting up an async test client for
    NinjaAPI.
  • +10/-0   
    test_decorators.py
    Test async decorator for exception handling                           

    py/packages/corpora/lib/dj/test_decorators.py
  • Added tests for the async_raise_not_found decorator.
  • Verified correct handling of ObjectDoesNotExist exceptions.
  • +35/-0   
    test_files.py
    Test checksum calculation utility                                               

    py/packages/corpora/lib/test_files.py
  • Added tests for the calculate_checksum function.
  • Validated MD5 checksum generation for various inputs.
  • +18/-0   
    test_api.py
    Add async tests for API endpoints                                               

    py/packages/corpora/test_api.py
  • Added async tests for API endpoints to create and retrieve Corpus and
    File.
  • Included tests for handling not found scenarios.
  • +69/-0   
    Configuration changes
    2 files
    settings.py
    Include Ninja in installed apps                                                   

    py/packages/testproj/settings.py - Added `ninja` to the list of installed apps.
    +1/-0     
    pytest.ini
    Configure pytest for async testing and warnings                   

    py/pytest.ini - Configured pytest to handle async tests and filter warnings.
    +19/-2   
    Documentation
    2 files
    TODO.md
    Update TODO with API authentication task                                 

    TODO.md - Added a note about authentication for API endpoints.
    +1/-0     
    README.md
    Document utilities in corpora.lib                                               

    py/packages/corpora/lib/README.md - Added README for `corpora.lib` utilities.
    +3/-0     
    Dependencies
    2 files
    requirements.txt
    Update requirements with django-ninja                                       

    py/packages/corpora/requirements.txt - Added `django-ninja` to the requirements.
    +1/-0     
    requirements-dev.txt
    Add pytest-asyncio to development requirements                     

    py/requirements-dev.txt - Added `pytest-asyncio` to development requirements.
    +1/-0     

    ๐Ÿ’ก PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    github-actions[bot] commented 3 weeks ago

    PR Reviewer Guide ๐Ÿ”

    Here are some key observations to aid the review process:

    โฑ๏ธ Estimated effort to review: 4 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšช
    ๐Ÿงช PR contains tests
    ๐Ÿ”’ No security concerns identified
    โšก Recommended focus areas for review

    Code Smell
    The `calculate_checksum` function is used directly in the `create_file` endpoint. Consider handling potential exceptions or errors that might arise during checksum calculation. Code Smell
    The `async_raise_not_found` decorator is catching `ObjectDoesNotExist` and raising `Http404`. Ensure that this behavior is consistent with the rest of the application and that it doesn't mask other potential issues. Possible Bug
    The `id` field for models `Corpus`, `File`, and `Split` is defined as a `UUIDField`. Ensure that all parts of the application correctly handle UUIDs, especially if there were previous assumptions about integer IDs.
    github-actions[bot] commented 3 weeks ago

    PR Code Suggestions โœจ

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Security
    Use a more secure hashing algorithm for checksum calculations ___ **Consider using a more secure hashing algorithm like SHA-256 instead of MD5 for
    calculating checksums, as MD5 is considered cryptographically broken and unsuitable
    for further use.** [py/packages/corpora/lib/files.py [6]](https://github.com/skyl/corpora/pull/6/files#diff-43435a16f3377c0d2bfcf1c95a2c2c32d4a15009da949dd89c547e6838db3720R6-R6) ```diff -return hashlib.md5(content.encode("utf-8")).hexdigest() +return hashlib.sha256(content.encode("utf-8")).hexdigest() ```
    Suggestion importance[1-10]: 9 Why: Replacing MD5 with SHA-256 significantly improves security, as MD5 is outdated and vulnerable to collision attacks. This suggestion addresses a critical security concern.
    9
    Enhancement
    Add a test case for handling missing required fields in API requests ___ **Add a test case to verify the behavior of the API when attempting to create a corpus
    with missing required fields, ensuring that the API handles such cases gracefully.** [py/packages/corpora/test_api.py [14-20]](https://github.com/skyl/corpora/pull/6/files#diff-32541e7e6828c01b89b5dced0e66a9d02b5c2fc9d29f4fffc8eb462cf0ea07abR14-R20) ```diff -async def test_create_corpus(self): - payload = {"name": "Test Corpus", "url": "https://example.com/repo"} +async def test_create_corpus_missing_fields(self): + payload = {"url": "https://example.com/repo"} # Missing 'name' response = await client.post("/corpus", json=payload) - assert response.status_code == 201 + assert response.status_code == 400 ```
    Suggestion importance[1-10]: 8 Why: Adding a test case for missing required fields enhances test coverage and ensures the API handles such scenarios correctly, improving reliability and robustness.
    8
    Possible issue
    Ensure the checksum calculation handles different data types for content ___ **Ensure that the calculate_checksum function is called with the correct data type, as
    it currently expects a string. If payload.content is not guaranteed to be a string,
    consider converting it or handling other data types appropriately.** [py/packages/corpora/api.py [31]](https://github.com/skyl/corpora/pull/6/files#diff-a3bfbda31f4958bd5b403792dd85648527f5192b25bae26a5eb57ac7771e1be6R31-R31) ```diff -checksum = calculate_checksum(payload.content) +checksum = calculate_checksum(str(payload.content)) ```
    Suggestion importance[1-10]: 7 Why: The suggestion to ensure that `calculate_checksum` handles different data types is valid, as it prevents potential runtime errors if `payload.content` is not a string. This enhances the robustness of the code.
    7
    Best practice
    Add logging for exceptions in the async decorator to improve debugging ___ **Consider logging the exception details in the async_raise_not_found decorator before
    raising the Http404 error to aid in debugging and monitoring.** [py/packages/corpora/lib/dj/decorators.py [13-14]](https://github.com/skyl/corpora/pull/6/files#diff-452f5e0334ec58ce28772c03d7e54074d13d7a18469e2788b2c122b2ed4f0ddcR13-R14) ```diff -except ObjectDoesNotExist: +except ObjectDoesNotExist as e: + logger.error(f"Object not found: {e}") raise Http404("Object not found") ```
    Suggestion importance[1-10]: 6 Why: Logging exceptions before raising errors can aid in debugging and monitoring, providing valuable context for developers. This is a good practice for maintaining code quality. However, it requires a logger setup which is not shown in the suggestion.
    6