OpenProject doesn't finish when reading large variables

goldmanm commented 4 years ago

When trying to save and load a project with a moderate sized variable, the OpenProject method seems to take an unreasonably long amount of time. I am note sure what the cause of the hyper slowness is, but I was able to produce a minimal working example:

Open clean dialog workspace.
Type large_var←?1000000 5⍴100 to make a ~9 MB variable
Save the project: ]acre.CreateProject '[file path here]' # -variables=On (took me around 5 seconds)
Close Dyalog and open up a new workspace
Try to load the project with: acre.OpenProject '[file path here]' # -track=On (took over 10 minutes before I interrupted the process)

This was done with ACRE 14.0 using Dyalog 17.1 32 bit with a workspace size approaching 2 GB.

To conclude, I am not sure why it seems to take almost forever to load a project when creating it is much quicker. Maybe there is some sort of loop running indefinitely and/or there is an error message being suppressed.

PhilLast commented 4 years ago

Hey Mark

Had my eye off the ball for a few days. I'll take a look and get back asap.

Phil Last

PhilLast commented 4 years ago

I gave up after 15 minutes. With 1000 rows the project opened in less than 1 second. 10000 rows took 5 seconds. I gave up after 5 minutes with 100000.

We are working with text files so that items can be edited either inside the workspace or in the user's favourite external text editor. In the notation any row of an array could represent a deeply nested vector. There is no optimisation for simple numeric arrays. There is however an attempt at memoisation so that we don't have to re-evaluate the same thing multiple times when an array has many duplicate cells.

In evaluating a million rows each is checked against a growing list of previous rows to retrieve a ready-made result if available. A newly found row is then parsed and evaluated.

No doubt the process could be speeded up. The memoisation might prove to take longer overall more often that not. Ensuring that the entire contents are contained in ' ¯eE.0123456789' or markup for simple arrays ('[ ]') would allow us to optimise simple numerics.

But the main problem as I see it is that the expectation for arrays saved in the source code of projects is that they are static data such as lookup tables and the like.

Million row, or even 100,000 row numeric arrays are more suited to storage in a database.

Even so I'll look to see if my ad-hoc suggestions above might lead to some improvement without slowing down everything else.

PaulMansour commented 4 years ago

Hi Mark. It's interesting, we didn't even really want to have arrays stored in the beginning. The APL array notation is not built into Dyalog APL, though someday it will be no doubt, but I'm not sure even then it would be efficient to store large arrays in text files. Exceptions are made for simple text matrices and for vectors of simple text vectors (same thing in different form). These should have no problem with size. However, I would still probably not use them to store database style data, as opposed to, say, MarkDown documents, which is pretty much what I use them for.

In general, while a Acre project is meant to be as convenient, (more so!) than a saved workspace, application data should be stored separately from source code, in a database, or component files. Small arrays that control an app are fine.

the-carlisle-group / Acre-Desktop

OpenProject doesn't finish when reading large variables #221