skyl / corpora

Corpora is a self-building corpus that can help build other arbitrary corpora
GNU Affero General Public License v3.0
2 stars 0 forks source link

feat(core): Upload corpus from CLI to API #10

Closed skyl closed 2 weeks ago

skyl commented 2 weeks ago

PR Type

Enhancement, Tests


Description


Changes walkthrough ๐Ÿ“

Relevant files
Enhancement
api.py
Add tarball upload functionality to corpus creation API   

py/packages/corpora/api.py
  • Added support for uploading a tarball when creating a new corpus.
  • Modified the create_corpus function to handle file uploads.
  • Updated import statements to include Form and File.
  • +30/-9   
    corpus.py
    Simplify CLI command for corpus initialization                     

    py/packages/corpora_cli/commands/corpus.py
  • Simplified the init command to upload a tarball directly.
  • Removed detailed comments and streamlined the upload process.
  • +9/-33   
    corpora_api.py
    Update API client for tarball upload support                         

    py/packages/corpora_client/api/corpora_api.py
  • Updated API client to support tarball uploads.
  • Changed method signatures to accept name, tarball, and url.
  • +51/-22 
    Tests
    test_api.py
    Add test for corpus creation with tarball upload                 

    py/packages/corpora/test_api.py
  • Added test for creating a corpus with a tarball upload.
  • Utilized SimpleUploadedFile for simulating file uploads in tests.
  • +27/-5   
    Documentation
    .corpora.yaml
    Comment out existing corpus configurations and add TODOs 

    .corpora.yaml
  • Commented out existing corpus configurations.
  • Added TODOs for future configuration considerations.
  • +45/-41 
    TODO.md
    Update TODOs for corpus collection and upload tasks           

    TODO.md
  • Added tasks related to corpus collection and tarball upload.
  • Included considerations for unique naming and configuration.
  • +5/-0     
    README.md
    Update README example for new corpus creation method         

    py/packages/corpora_client/README.md
  • Updated example to reflect changes in API method signature.
  • Included parameters for name, tarball, and url.
  • +4/-2     
    CorporaApi.md
    Update API documentation for corpus creation with tarball

    py/packages/corpora_client/docs/CorporaApi.md
  • Updated documentation to reflect new API method signature.
  • Changed content type to multipart/form-data.
  • +10/-7   

    ๐Ÿ’ก PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    github-actions[bot] commented 2 weeks ago

    PR Reviewer Guide ๐Ÿ”

    Here are some key observations to aid the review process:

    โฑ๏ธ Estimated effort to review: 4 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšช
    ๐Ÿงช PR contains tests
    ๐Ÿ”’ No security concerns identified
    โšก Recommended focus areas for review

    Code Smell
    The `create_corpus` function contains commented-out code and TODO comments that should be addressed or removed for clarity and maintainability. Code Smell
    The `init` function in the CLI command has commented-out code that should be cleaned up or clarified to improve readability.
    github-actions[bot] commented 2 weeks ago

    PR Code Suggestions โœจ

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Robustness
    Add exception handling for reading tarball content to improve robustness ___ **Consider handling potential exceptions when reading the tarball content to prevent
    the application from crashing if the file is corrupted or unreadable.** [py/packages/corpora/api.py [26]](https://github.com/skyl/corpora/pull/10/files#diff-a3bfbda31f4958bd5b403792dd85648527f5192b25bae26a5eb57ac7771e1be6R26-R26) ```diff -tarball_content = await sync_to_async(tarball.read)() +try: + tarball_content = await sync_to_async(tarball.read)() +except Exception as e: + # Handle exception, e.g., log error and return a response ```
    Suggestion importance[1-10]: 8 Why: Adding exception handling when reading the tarball content is a significant improvement for robustness, as it prevents the application from crashing due to unreadable or corrupted files. This suggestion is directly applicable and enhances the reliability of the code.
    8
    Implement error handling for the API call to manage potential failures ___ **Add error handling for the API call to corpora_api_create_corpus to manage potential
    failures gracefully.** [py/packages/corpora_cli/commands/corpus.py [28-31]](https://github.com/skyl/corpora/pull/10/files#diff-c04094e69a69881444f8b97d9fee7e0d29683f5e062cb4d2cf4530bc08daedf7R28-R31) ```diff -res = c.api_client.corpora_api_create_corpus( - name="corpora", - url=None, - tarball=tarball, -) +try: + res = c.api_client.corpora_api_create_corpus( + name="corpora", + url=None, + tarball=tarball, + ) +except Exception as e: + c.console.print(f"Failed to create corpus: {e}") ```
    Suggestion importance[1-10]: 8 Why: Adding error handling for the API call is a valuable enhancement, as it allows the application to handle failures gracefully and provide informative feedback to the user. This improves the robustness and user experience of the CLI tool.
    8
    Possible bug
    Verify that the client variable is properly defined or imported in the test ___ **Ensure the client variable is defined or imported in the test to avoid runtime
    errors.** [py/packages/corpora/test_api.py [67-73]](https://github.com/skyl/corpora/pull/10/files#diff-32541e7e6828c01b89b5dced0e66a9d02b5c2fc9d29f4fffc8eb462cf0ea07abR67-R73) ```diff +# Ensure 'client' is defined or imported response = await client.post( "/corpus", data=data, - # Some hint I found - # https://github.com/vitalik/django-ninja/issues/765 FILES={"tarball": file}, headers=headers, ) ```
    Suggestion importance[1-10]: 7 Why: Ensuring the `client` variable is defined or imported is crucial to avoid runtime errors during testing. This suggestion addresses a potential bug, making it important for the correctness of the test suite.
    7
    Best practice
    Validate the tarball parameter to ensure correct data types are passed ___ **Ensure that the tarball parameter is correctly validated to prevent invalid data
    types from being passed.** [py/packages/corpora_client/api/corpora_api.py [46]](https://github.com/skyl/corpora/pull/10/files#diff-5ab02211229033fa8269352abaafaa78d85884adc1e58a9eb099e09daea6aa01R46-R46) ```diff -tarball: Union[StrictBytes, StrictStr, Tuple[StrictStr, StrictBytes]], +tarball: Annotated[Union[StrictBytes, StrictStr, Tuple[StrictStr, StrictBytes]], Field(..., description="Tarball content must be bytes or a tuple of filename and bytes")], ```
    Suggestion importance[1-10]: 6 Why: While the suggestion to validate the `tarball` parameter is a good practice, it offers a moderate improvement. The existing use of type hints already provides some level of validation, so this suggestion enhances clarity and documentation rather than functionality.
    6