icgc-argo / argo-clinical

Clinical data submission for ARGO programs.
GNU Affero General Public License v3.0
2 stars 0 forks source link

Multi-threaded application implementation - sample registration #421

Closed rosibaj closed 4 years ago

rosibaj commented 4 years ago
blabadi commented 4 years ago

current problems that were observed : 1- schema validation: (~15-19 sec) (scripts mostly), solution: can be parallelized by worker a threads pool

2 - there was [ 3 x n^2 ] loops in the in-file cross-record validation (~17-20 sec)

solution: do a single loop and create indexes to lookup submitted record by sample, specimen, donor sub Id, and use that to find where it was referenced in the same file

3- a lot of queries to do things like:

solution: load all program donors in advance, build hash indexes to look up any donor by specimen / sample Id without need to trip to db

other notes, iterating over 3k records takes around (0.005 sec) without any db queries.

blabadi commented 4 years ago

related PR: https://github.com/icgc-argo/argo-clinical/pull/439/files

blabadi commented 4 years ago

Testing:

rosibaj commented 4 years ago

@blabadi do you have a large file already formatted that i can use for testing?

blabadi commented 4 years ago

@rosibaj yeah here: https://github.com/icgc-argo/argo-clinical/tree/master/test/performance-test check the 3k / 1k / 300 tsvs

rosibaj commented 4 years ago

I tested 3K with the UI - it has a lot of issues from the frontend. The Sample Registration page crashed several times in the process. @blabadi did you have a time for how long 3K samples took thru the api only for both upload and registration that I can compare to?

blabadi commented 4 years ago

the browser can have hard time rendering the 3k rows, the UI needs to do pagination which I mentioned before. I'll give it a shot on QA and see the time it takes there.

blabadi commented 4 years ago

through the UI i got this time (7 secs) not including the rendering delay for 3k records testca 3k qa

blabadi commented 4 years ago

note that the response is around 13 MB which is too big and can affect response time based on the internet speed of the user with our servers, I believe we can use compression there but that's another war to fight

rosibaj commented 4 years ago

yup @blabadi - I just wanted to compare some times in various scenarios.

Did you have a benchmark time through the api for both the upload and commit steps that you were seeing that I can compare to in my testing?

blabadi commented 4 years ago

from the API - around 6.8 secs on QA: image

blabadi commented 4 years ago

the commit API update is not on QA or Dev yet, it currently takes around 27 seconds give or take