quince-science / QuinCe

QuinCe is an online tool for processing and quality control of data from scientific instruments, with a primary focus on oceanic data.
https://quince.science
GNU General Public License v3.0
7 stars 8 forks source link

Multi-thread jobs? #1173

Open squaregoldfish opened 5 years ago

squaregoldfish commented 5 years ago

Most of the jobs we run should be easy to parallelise, at least in theory - they run on individual records that don't depend on each other. Explore this to see if it's a good way to speed up jobs.

This is probably better than having a multi-threaded job pool, since the odds of having overlapping jobs from different sources are minimal (except perhaps when processing NRT datasets on a cron schedule)

This may not be effective if the database activity is the major part of the job time. That's a whole different optimisation problem.

squaregoldfish commented 5 years ago

Maybe make use of the parallel streaming features of Java 8?

squaregoldfish commented 1 year ago

Take special care with routines that compare between multiple values.

squaregoldfish commented 1 year ago

Will include general performance improvements of data processing jobs in this issue.

squaregoldfish commented 1 year ago

ExtractDatasetJob

Database-bound

Screenshot 2023-04-26 at 15 37 11

AutoQCJob

database-bound

Screenshot 2023-04-26 at 15 37 58

LocateMeasurementsJob

Mostly database-bound, but there's a small improvement we can make in the code

Screenshot 2023-04-26 at 15 38 50

DataReductionJob

Looks like there's some improvements to make here - mostly improving the calibration algorithm

Screenshot 2023-04-26 at 15 39 40

DataReductionQCJob

Possibly some work we can do here.

Screenshot 2023-04-26 at 15 44 27
squaregoldfish commented 1 year ago

Check the DataReductionJob for other sensors too, but after the calibration is sorted out.