matchish / laravel-scout-elasticsearch

Search among multiple models with ElasticSearch and Laravel Scout
MIT License
702 stars 113 forks source link

Parallel import with trackable jobs. #275

Closed DavisPuciriuss closed 3 months ago

DavisPuciriuss commented 4 months ago

Continuation of #78.

Added FromScope, that uses forPageAfterId instead of forPage.

Added a new option within ImportCommand - "--parallel". This option requires user to set-up async job handler within their configuration. Main thread continues to pull data from the database, while workers handle making the data searchable.

For the parallel import to work, added Trackable-Jobs package, that looks to be actively maintained. Updated the existing tests to pass, and added few tests to handle synced parallel import. Started to make changes for larastan.

Performance testing (~250k records, mysql):

Will appreciate any feedback on what should be improved for this PR to be merged.

matchish commented 4 months ago

Thank you for your great work on this pull request! I really appreciate your effort to integrate the parallel import feature. Here are a few areas where I believe we could improve the implementation:

  1. Tests: Currently, the tests are failing. Could you take a look to ensure that the new changes don't break existing functionalities?
  2. Optional Dependency: It would be beneficial if the trackable-jobs package remains optional for users who are not interested in the parallel import feature.
  3. Backward Compatibility: Please ensure that for users not utilizing the parallel import feature, there are no changes in behavior. This will help us maintain backward compatibility and potentially release this feature faster.
  4. Progress Visibility: During the parallel import, being able to see progress would enhance the user experience.
  5. Cancel Overlapping Imports: Finally, it would be ideal if a new import(sync or parallel) could cancel the current parallel import to prevent overlaps.

Thank you once again for your contribution! Looking forward to your thoughts and further improvements.

matchish commented 4 months ago

mateusjunges/laravel-trackable-jobs should be installed into test env https://github.com/matchish/laravel-scout-elasticsearch/blob/master/.github/workflows/test-application.yaml

matchish commented 4 months ago

Ensure that the master branch of the fork is synchronized with this repository, or add this line to your branch. https://github.com/matchish/laravel-scout-elasticsearch/blob/7a457e2e956bfdcef2158f98291fa8576181c933/.github/workflows/test-application.yaml#L82

DavisPuciriuss commented 4 months ago

Synchronized this PR with master branch.

DavisPuciriuss commented 4 months ago

The master branch on the fork is now synced aswell. Missed previously that you mentioned that it needs to be synced on the master branch.

matchish commented 4 months ago

could you push a new commit. for example readme update with section about parallel feature. let's see if it will run new workflow

matchish commented 4 months ago

or close this pr and open new one

codecov-commenter commented 4 months ago

Codecov Report

Attention: Patch coverage is 95.25862% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 98.11%. Comparing base (2a83b2b) to head (9842153).

Files Patch % Lines
src/Jobs/Stages/PullFromSourceParallel.php 91.07% 5 Missing :warning:
...rc/ElasticSearch/EloquentHitsIteratorAggregate.php 75.00% 2 Missing :warning:
src/Engines/ElasticSearchEngine.php 83.33% 2 Missing :warning:
src/Jobs/ProcessSearchable.php 92.30% 1 Missing :warning:
src/Jobs/Stages/StopTrackedJobs.php 93.75% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## 8.x #275 +/- ## ============================================ + Coverage 95.99% 98.11% +2.12% - Complexity 194 255 +61 ============================================ Files 36 40 +4 Lines 624 797 +173 ============================================ + Hits 599 782 +183 + Misses 25 15 -10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

matchish commented 4 months ago

Tests: Currently, the tests are failing. Could you take a look to ensure that the new changes don't break existing functionalities? Optional Dependency: It would be beneficial if the trackable-jobs package remains optional for users who are not interested in the parallel import feature. Backward Compatibility: Please ensure that for users not utilizing the parallel import feature, there are no changes in behavior. This will help us maintain backward compatibility and potentially release this feature faster. Progress Visibility: During the parallel import, being able to see progress would enhance the user experience. Cancel Overlapping Imports: Finally, it would be ideal if a new import(sync or parallel) could cancel the current parallel import to prevent overlaps.

First two points for sure done. What about rest? Do you think we are safe?

Also plz apply diff from styleci diff file Use the git apply command followed by the path to your diff file, like this git apply O375ML.diff

DavisPuciriuss commented 4 months ago

Hey, applied the diff from styleci.

Progress Visibility: If this is meant as the progress bar within console, when importing. Then this is already done and there is a test for this specifically here: https://github.com/DavisPuciriuss/laravel-scout-elasticsearch/blob/a1ae08089a8f563fa12e778f6d1ec3fcb57a7d6d/tests/Integration/Jobs/ImportTest.php#L47

Cancel Overlapping Imports: Would like some more explanation on this, as currently cancelling the parallel import it will not continue to pull in data from DB, but will finish the pending jobs. Is there a way to detect if someone has cancelled the current import ?

Backward Compatibility: The major changes are within StageInterface and ImportSource interfaces. Both of these interfaces have new functions. In the StageInterface case, the problem was providing the next PullFromSource with the lastId for usage in forPageAfterId(). Solved this by adding a completed() method, allowing some stages to become iterable. This could also be changed into creating another StageInterfaceIterable, but then it would require re-creating the existing stages and would create a lot of duplicated code. Maybe you have a better suggestion for this ?

matchish commented 4 months ago

Cancel Overlapping Imports: Would like some more explanation on this, as currently cancelling the parallel import it will not continue to pull in data from DB, but will finish the pending jobs. Is there a way to detect if someone has cancelled the current import ?

I believe proper way is to check if there are started/pending import jobs and cancelling them before to start a new import. otherwise could be overlapping with jobs that not cancelled yet from previous import.

DavisPuciriuss commented 4 months ago

Okay, will make some changes tomorrow with in-progress or pending jobs, when starting another import job.

What about the other two points, are those fine or should I make some changes there aswell ?

matchish commented 4 months ago

What about the other two points, are those fine or should I make some changes there aswell ?

I have to dive deeper. Can't answer now

DavisPuciriuss commented 3 months ago

Cancel Overlapping Imports: Would like some more explanation on this, as currently cancelling the parallel import it will not continue to pull in data from DB, but will finish the pending jobs. Is there a way to detect if someone has cancelled the current import ?

I believe proper way is to check if there are started/pending import jobs and cancelling them before to start a new import. otherwise could be overlapping with jobs that not cancelled yet from previous import.

This has been added, if you start a new import, the previous one will cancel.

matchish commented 3 months ago

I’m not sure when I can dive deep into the code. There are some things to polish, but overall, it looks fine.

Since the code is not backward compatible and to avoid blocking the release, I propose releasing it as 8.0.0-alpha.1 to highlight that the API might change. We can merge it into the 8x branch, which will live on top of the master branch. This way, we can improve it based on feedback from developers and continue contributing to stable versions.

matchish commented 3 months ago

If agree plz update Changelog and we can release

DavisPuciriuss commented 3 months ago

Hey, great to hear back from you. The alpha release would be great, to both test the code out more and receive more feedback as you mentioned. Updated the changelog.

matchish commented 3 months ago

Releasing. Huge thanks to @DavisPuciriuss! Your contribution is a significant step forward for the project.