Closed goldmanm closed 4 years ago
Hey Mark
Had my eye off the ball for a few days. I'll take a look and get back asap.
Phil Last
I gave up after 15 minutes. With 1000 rows the project opened in less than 1 second. 10000 rows took 5 seconds. I gave up after 5 minutes with 100000.
We are working with text files so that items can be edited either inside the workspace or in the user's favourite external text editor. In the notation any row of an array could represent a deeply nested vector. There is no optimisation for simple numeric arrays. There is however an attempt at memoisation so that we don't have to re-evaluate the same thing multiple times when an array has many duplicate cells.
In evaluating a million rows each is checked against a growing list of previous rows to retrieve a ready-made result if available. A newly found row is then parsed and evaluated.
No doubt the process could be speeded up. The memoisation might prove to take longer overall more often that not. Ensuring that the entire contents are contained in ' ¯eE.0123456789' or markup for simple arrays ('[ ]') would allow us to optimise simple numerics.
But the main problem as I see it is that the expectation for arrays saved in the source code of projects is that they are static data such as lookup tables and the like.
Million row, or even 100,000 row numeric arrays are more suited to storage in a database.
Even so I'll look to see if my ad-hoc suggestions above might lead to some improvement without slowing down everything else.
Hi Mark. It's interesting, we didn't even really want to have arrays stored in the beginning. The APL array notation is not built into Dyalog APL, though someday it will be no doubt, but I'm not sure even then it would be efficient to store large arrays in text files. Exceptions are made for simple text matrices and for vectors of simple text vectors (same thing in different form). These should have no problem with size. However, I would still probably not use them to store database style data, as opposed to, say, MarkDown documents, which is pretty much what I use them for.
In general, while a Acre project is meant to be as convenient, (more so!) than a saved workspace, application data should be stored separately from source code, in a database, or component files. Small arrays that control an app are fine.
When trying to save and load a project with a moderate sized variable, the OpenProject method seems to take an unreasonably long amount of time. I am note sure what the cause of the hyper slowness is, but I was able to produce a minimal working example:
large_var←?1000000 5⍴100
to make a ~9 MB variable]acre.CreateProject '[file path here]' # -variables=On
(took me around 5 seconds)acre.OpenProject '[file path here]' # -track=On
(took over 10 minutes before I interrupted the process)This was done with ACRE 14.0 using Dyalog 17.1 32 bit with a workspace size approaching 2 GB.
To conclude, I am not sure why it seems to take almost forever to load a project when creating it is much quicker. Maybe there is some sort of loop running indefinitely and/or there is an error message being suppressed.