Open fepegar opened 4 years ago
This is a table (Github doesn't take CSV files) with the benchmark results. Related to #2 and #7.
these numbers are seconds? I'll have to look into this but I think once I do the manual checks and pickle the DataFrame (instead of preprocessing the excel file every time a query is made) then this issue should be resolved. Can't think of it being related to e.g. long lists in semiology dictionary as others such as epigastric and autonomous-vegetative also have long lists but don't take as long.
As I said in #2, ideally the Excel should be read only once and the data frame cached. Then, the information can be extracted from that data frame without reading and parsing the Excel many times per query.
Also I think it would be best to have a CSV, not an Excel. Reading the Excel file with Pandas takes 140 ms, which is a lot. When I save that DF as a CSV, loading it takes 23 ms. Still quite a lot, therefore the need to cache.
have you seen the dataframes? check the branch under resources.
Which data frames? What branch do you mean?
here SVT there is only one other branch now besides master
Why is there a second branch anyway? Are you planning to merge with master
?
other branch merged already - now keeping it for verbose printout to allow me to manually inspect the outputs and keep my notebooks working for now.
Reading the data frame once didn't help. I think the Excel file is being read many other times in the code. Actually, things are slower now for some reason. "Hypermotor" takes more than 5 minutes!
I think the problem is that the Excel file is read many times by big_map
and gifs_lat
in QUERY_LATERALISATION
.
I see you updated these, let me know what has happened to the execution times?
So I don't see those crazy times like 5 minutes, but now they're back to 0-15 seconds. It feels like the code loops longer for some semiologies than for others. We need to find the bottleneck by debugging and/or profiling. I've never used profiling, but I think it's very useful, especially with a GUI.
I'd have to look up profiling too
New benchmark.
there doesn't seem to be a relationship between the length of the semiology and time so will have to profile later
What are you thoughts on running the entire database once, like now as the database in complete, and adding the resulting GIF structures and values as a dictionary under resources with version number. So when we actually use the 3D slicer SVT, and tick a semiology and click on Update visualisation
, instead of running query_lateralisation
or query_semiology
etc, it simply reads off the dictionary.
Concretely:
SemioDict YAML
filemega_analysis
module) and/or the Semio2Brain Database
are updated, we run the entire thing again and update the GIF outputs (and bump versions)To do this we can use the following to obtain the GIF structures:
[x] debugging_multi_postictals_neutral_also.py
[x] debugging_multi_postictals_neutral_only.py
[x] debugging_multi_terms_neutral_also.py
[x] debugging_multi_terms_neutral_only.py
[ ] an equivalent script to above reading off the remaining semiologies in semiologies_lateralised_only_default_list.txt
So I don't see those crazy times like 5 minutes, but now they're back to 0-15 seconds. It feels like the code loops longer for some semiologies than for others. We need to find the bottleneck by debugging and/or profiling. I've never used profiling, but I think it's very useful, especially with a GUI.
Profiling in #208
Related to #2.