Closed rosibaj closed 4 years ago
current problems that were observed : 1- schema validation: (~15-19 sec) (scripts mostly), solution: can be parallelized by worker a threads pool
2 - there was [ 3 x n^2 ] loops in the in-file cross-record validation (~17-20 sec)
solution: do a single loop and create indexes to lookup submitted record by sample, specimen, donor sub Id, and use that to find where it was referenced in the same file
3- a lot of queries to do things like:
solution: load all program donors in advance, build hash indexes to look up any donor by specimen / sample Id without need to trip to db
other notes, iterating over 3k records takes around (0.005 sec) without any db queries.
Testing:
@blabadi do you have a large file already formatted that i can use for testing?
@rosibaj yeah here: https://github.com/icgc-argo/argo-clinical/tree/master/test/performance-test check the 3k / 1k / 300 tsvs
I tested 3K with the UI - it has a lot of issues from the frontend. The Sample Registration page crashed several times in the process. @blabadi did you have a time for how long 3K samples took thru the api only for both upload and registration that I can compare to?
the browser can have hard time rendering the 3k rows, the UI needs to do pagination which I mentioned before. I'll give it a shot on QA and see the time it takes there.
through the UI i got this time (7 secs) not including the rendering delay for 3k records
note that the response is around 13 MB which is too big and can affect response time based on the internet speed of the user with our servers, I believe we can use compression there but that's another war to fight
yup @blabadi - I just wanted to compare some times in various scenarios.
Did you have a benchmark time through the api for both the upload and commit steps that you were seeing that I can compare to in my testing?
from the API - around 6.8 secs on QA:
the commit API update is not on QA or Dev yet, it currently takes around 27 seconds give or take