vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
982 stars 118 forks source link

Define pre-proc vars for FORMTMP and FORMTMPSORT. #433

Open jodavies opened 1 year ago

jodavies commented 1 year ago

Sometimes these paths can be useful in a form script, for use with #system, for example.

jodavies commented 1 year ago

This causes a valgrind read error in vorm tests, I am unable to tell so far if that is my fault or of it simply uncovers something else. If I understand correctly, adding new pre-proc vars really is this simple, right?

tueda commented 1 year ago

I think the paths AM.TempDir and AM.TempSortDir are not initialized at the code you inserted (they are modified below).

Edit: Never mind. I was wrong.

jodavies commented 1 year ago

I can reproduce this in my machine if I copy the code of the failing test. However if I modify it by, say, defining two more pre-proc variables in the form script which don't do anything, valgrind is again happy.

tueda commented 1 year ago

OK, this is not your fault. Actually, the following code gives a Valgrind error with the master branch:

#do i=1,5
  #define x`i'
#enddo
#define var1(a) "(`~a')"
#message `var1(1)'
.end

Something wrong seems to occur for FromList and defining the macro with the deferred substitution...

Edit: deferred substitution is not needed to reproduce the Valgrind error:

#do i=1,5
  #define x`i'
#enddo
#define var(a) "(b)"
#message `var(1)'
.end
tueda commented 1 year ago

The memory error was fixed in 741861aef8c7fa81bfddf4eab67f3f2cf4ccf53b.

I am not sure about the use case of these variables with #system or other scriptings. Basically, temporary files in TempDir/TempSortDir should not be touched, right? Or do you need just the directory paths, say, to get a temporary directory? (But in this case, you can use FORMPATH or TMPDIR, which are probably defined in your environment.)

jodavies commented 1 year ago

I would like to do something like

#system gunzip < /network/path/large.sav.gz > `FORMTMP_'/large.sav
Load `FORMTMP_'/large.sav;
#system rm `FORMTMP_'/large.sav

Personally I have FORMTMP defined in my environment, but this doesn't work if a user has defined their temp dir with the -t argument instead.

tueda commented 1 year ago

On the one hand I'm still not so convinced why `TMPDIR'/large.sav is not enough for temporary files (OK, maybe it depends on cluster configurations), but on the other hand invoking form -t /tmp prog.frm would be handier than FORMTMP=/tmp form prog.frm for such scripting...

I would merge this pull request, but before that, does anyone else have any other opinions?

jodavies commented 1 year ago

In the environment variable case you need to have called form as form -d TMPDIR=$FORMTMP anyway don't you? In the event that you don't have full control over how your form code is called, it seems to me this is the only way to guarantee access to the temp directory in the script.

tueda commented 1 year ago

In the environment variable case you need to have called form as form -d TMPDIR=$FORMTMP anyway don't you?

Well, yes, if TMPDIR must have the same value as FORMTMP and so you want to overwrite the default TMPDIR with FORMTMP.

I assume the environment variable TMPDIR is defined on Unix or at least on Linux, so it is available for the preprocessor. For Windows or other special environments, #ifdef is needed to check if TMPDIR is given by the system. I think Windows provides TMP and TEMP instead of TMPDIR, so a generic way would be:

#if (isdefined(TMPDIR))
  #define mytempdir "`TMPDIR'"
#elseif (isdefined(TMP))
  #define mytempdir "`TMP'"
#elseif (isdefined(TEMP))
  #define mytempdir "`TEMP'"
#else
  #define mytempdir "."
#endif

#message `mytempdir'/large.sav
.end

To me, TMPDIR sounds more or less for temporary files in general, and FORMTMP is for temporary files that FORM automatically creates (I admit that you may argue that FORMTMP could be for any temporary files during a FORM run, though).

In general, TMPDIR may differ from FORMTMP (though you may set export FORMTMP=$TMPDIR on your shell). Whether it is better to make temporary files in TMPDIR or FORMTMP (speed/storage size) depends on where FORM is running. In this sense, the script needs to know the environment and has no way to determine where is the best place for temporary files. The script needs some assumption for it or the user who invokes FORM needs to specify this information in some way (which may be by the -t option as you mentioned).

jodavies commented 1 year ago

I didn't actually know that the preprocessor will search for variables in the environment if they are not defined in the script or as an argument, thanks.

For my purposes, what would actually be even cleaner is if FORM were to compress the expression data inside the save files by itself. Usually I see something like a 10x compression ratio here, which saves quite some disk space and network bandwidth. This is much more difficult to implement, however.

vermaseren commented 1 year ago

I think the better solution may be that .sav files have the option to be gzipped by Form when they are written. That is more programming work, but it would have more value in the long run. And there are examples of the use of gzip in the Form sources. Of course one should not gzip the whole file as one stream, but each expression seperately. In that case one has to have an index that is not compressed which tells which expressions there are and where in the file. Such an index is there already. Hence that part should not be compressed.

On 7 Mar 2023, at 13:38, Takahiro Ueda @.***> wrote:

In the environment variable case you need to have called form as form -d TMPDIR=$FORMTMP anyway don't you?

Well, yes, if TMPDIR must have the same value as FORMTMP and so you want to overwrite the default TMPDIR with FORMTMP.

I assume the environment variable TMPDIR is defined on Unix or at least on Linux, so it is available for the preprocessor. For Windows or other special environments, #ifdef is needed to check if TMPDIR is given by the system. I think Windows provides TMP and TEMP instead of TMPDIR, so a generic way would be:

if (isdefined(TMPDIR))

define mytempdir "`TMPDIR'"

elseif (isdefined(TMP))

define mytempdir "`TMP'"

elseif (isdefined(TEMP))

define mytempdir "`TEMP'"

else

define mytempdir "."

endif

message `mytempdir'/large.sav

.end To me, TMPDIR sounds more or less for temporary files in general, and FORMTMP is for temporary files that FORM automatically creates (I admit that you may argue that FORMTMP could be for any temporary files during a FORM run, though).

In general, TMPDIR may differ from FORMTMP (though you may set export FORMTMP=$TMPDIR on your shell). Whether it is better to make temporary files in TMPDIR or FORMTMP (speed/storage size) depends on where FORM is running. In this sense, the script needs to know the environment and has no way to determine where is the best place for temporary files. The script needs some assumption for it or the user who invokes FORM needs to specify this information in some way (which may be by the -t option as you mentioned).

— Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/pull/433#issuecomment-1458100821, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJPCEUTURT4EPOSH2DG2WTW24T37ANCNFSM6AAAAAAVQ2E62A. You are receiving this because you are subscribed to this thread.

tueda commented 1 year ago

Probably, saving files with gzip-compressed expressions would be discussed as a separate issue. I have created it: https://github.com/vermaseren/form/issues/436.