Closed DavisPuciriuss closed 3 months ago
Thank you for your great work on this pull request! I really appreciate your effort to integrate the parallel import feature. Here are a few areas where I believe we could improve the implementation:
trackable-jobs
package remains optional for users who are not interested in the parallel import feature.Thank you once again for your contribution! Looking forward to your thoughts and further improvements.
mateusjunges/laravel-trackable-jobs should be installed into test env https://github.com/matchish/laravel-scout-elasticsearch/blob/master/.github/workflows/test-application.yaml
Ensure that the master branch of the fork is synchronized with this repository, or add this line to your branch. https://github.com/matchish/laravel-scout-elasticsearch/blob/7a457e2e956bfdcef2158f98291fa8576181c933/.github/workflows/test-application.yaml#L82
Synchronized this PR with master branch.
The master branch on the fork is now synced aswell. Missed previously that you mentioned that it needs to be synced on the master branch.
could you push a new commit. for example readme update with section about parallel feature. let's see if it will run new workflow
or close this pr and open new one
Attention: Patch coverage is 95.25862%
with 11 lines
in your changes are missing coverage. Please review.
Project coverage is 98.11%. Comparing base (
2a83b2b
) to head (9842153
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Tests: Currently, the tests are failing. Could you take a look to ensure that the new changes don't break existing functionalities?
Optional Dependency: It would be beneficial if the trackable-jobs package remains optional for users who are not interested in the parallel import feature.
Backward Compatibility: Please ensure that for users not utilizing the parallel import feature, there are no changes in behavior. This will help us maintain backward compatibility and potentially release this feature faster.
Progress Visibility: During the parallel import, being able to see progress would enhance the user experience.
Cancel Overlapping Imports: Finally, it would be ideal if a new import(sync or parallel) could cancel the current parallel import to prevent overlaps.
First two points for sure done. What about rest? Do you think we are safe?
Also plz apply diff from styleci diff file
Use the git apply command followed by the path to your diff file, like this
git apply O375ML.diff
Hey, applied the diff from styleci.
Progress Visibility: If this is meant as the progress bar within console, when importing. Then this is already done and there is a test for this specifically here: https://github.com/DavisPuciriuss/laravel-scout-elasticsearch/blob/a1ae08089a8f563fa12e778f6d1ec3fcb57a7d6d/tests/Integration/Jobs/ImportTest.php#L47
Cancel Overlapping Imports: Would like some more explanation on this, as currently cancelling the parallel import it will not continue to pull in data from DB, but will finish the pending jobs. Is there a way to detect if someone has cancelled the current import ?
Backward Compatibility: The major changes are within StageInterface and ImportSource interfaces. Both of these interfaces have new functions. In the StageInterface case, the problem was providing the next PullFromSource with the lastId for usage in forPageAfterId(). Solved this by adding a completed() method, allowing some stages to become iterable. This could also be changed into creating another StageInterfaceIterable, but then it would require re-creating the existing stages and would create a lot of duplicated code. Maybe you have a better suggestion for this ?
Cancel Overlapping Imports: Would like some more explanation on this, as currently cancelling the parallel import it will not continue to pull in data from DB, but will finish the pending jobs. Is there a way to detect if someone has cancelled the current import ?
I believe proper way is to check if there are started/pending import jobs and cancelling them before to start a new import. otherwise could be overlapping with jobs that not cancelled yet from previous import.
Okay, will make some changes tomorrow with in-progress or pending jobs, when starting another import job.
What about the other two points, are those fine or should I make some changes there aswell ?
What about the other two points, are those fine or should I make some changes there aswell ?
I have to dive deeper. Can't answer now
Cancel Overlapping Imports: Would like some more explanation on this, as currently cancelling the parallel import it will not continue to pull in data from DB, but will finish the pending jobs. Is there a way to detect if someone has cancelled the current import ?
I believe proper way is to check if there are started/pending import jobs and cancelling them before to start a new import. otherwise could be overlapping with jobs that not cancelled yet from previous import.
This has been added, if you start a new import, the previous one will cancel.
I’m not sure when I can dive deep into the code. There are some things to polish, but overall, it looks fine.
Since the code is not backward compatible and to avoid blocking the release, I propose releasing it as 8.0.0-alpha
.1 to highlight that the API might change. We can merge it into the 8x branch, which will live on top of the master branch. This way, we can improve it based on feedback from developers and continue contributing to stable versions.
If agree plz update Changelog and we can release
Hey, great to hear back from you. The alpha release would be great, to both test the code out more and receive more feedback as you mentioned. Updated the changelog.
Releasing. Huge thanks to @DavisPuciriuss! Your contribution is a significant step forward for the project.
Continuation of #78.
Added FromScope, that uses forPageAfterId instead of forPage.
Added a new option within ImportCommand - "--parallel". This option requires user to set-up async job handler within their configuration. Main thread continues to pull data from the database, while workers handle making the data searchable.
For the parallel import to work, added Trackable-Jobs package, that looks to be actively maintained. Updated the existing tests to pass, and added few tests to handle synced parallel import. Started to make changes for larastan.
Performance testing (~250k records, mysql):
Will appreciate any feedback on what should be improved for this PR to be merged.