vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
1.16k stars 138 forks source link

Splitting an expression into single terms and saving them into separate files #550

Closed vsht closed 2 weeks ago

vsht commented 3 months ago

I have a long expression that I'd like to split into single terms, saving each of these terms into a separate file. The splitting itself is not an issue and I end up with multiple expressions that now need to be saved.

G amp1 = 1;
G amp2 = 2;
G amp3 = 3;
G amp4 = 4;
G amp5 = 5;
.sort
.global
.sort
#do i=1,5
G samp`i' = amp`i';
.store
save amp`i'.sav samp`i';
#enddo
.end

This works, but has a disadvantage that each file will contain a differently named global expression containing the term. Suppose that I load one of these .sav files but I don't know the name of that global expression in advance. Is there a way I could get it during the execution? Apparently, before I define some G, the loaded name will not appear in NUMACTIVEEXPRS_ so I'm a bit at lost here. For example here

Load amp3.sav;
.sort
print;
#message `NUMACTIVEEXPRS_'
.end

Is there a way I could automatically figure out that the expression is called samp3?

Alternatively, I was trying to change my code in such a way, that each expression gets the same name. The naive way

#do i=1,5
G samp = amp`i';
.store
save amp`i'.sav samp;
#enddo
.end

obviously doesn't work, because I cannot overwrite a stored expression. However, if I add delete storage;, then the expressions I'm trying to save (amp1,amp2,...) will be deleted as well, so that won't work either.

Any help would be greatly appreciated.

Cheers, Vlad

jodavies commented 3 months ago

At least in your example, the name of the expression is implied by the name of the file. Is this not the case in your real problem?

vsht commented 3 months ago

At least in your example, the name of the expression is implied by the name of the file. Is this not the case in your real problem?

Not really. The real file name is made is made of the diagram number, number of loops, topology name and integral number, as I'm splitting a single diagram with many loop integrals into files, where each file contains only one integral.

Unfortunately, something like s1dia6471L3Ttopology1555I900 is too long for a variable name. So I need to shorten it significantly, but I want to have a shortening scheme that is independent of the file name. Hence the original problem.

jodavies commented 3 months ago

If text-mode files are OK, this will be much easier to achieve with #write.

How large are your expressions? Writing out hundreds of thousands of files would not be so nice...

For my own computations I have split things somewhat like this, but only at the level of topologies which appear in the expression. Then during further (parallel) processing of the parts, the terms are able to merge with each other. If you process each term completely independently, I could imagine you will have very large intermediate results which at some point must all be loaded and sorted together?

vsht commented 3 months ago

I know that #write would make things much easier. In principle, for this particular calculation this would probably work, but ideally I would still be interested in using binary files.

It's somewhat surprising that such a simple task doesn't seem to be achievable in an easy way within FORM.

jodavies commented 3 months ago

If you can arrange that your expressions are all in dollar variables, you can do it like this:

#do i = 1,5
    #$amp`i' = `i';
#enddo

#do i = 1,5
    #message store `i'
    Global amp = $amp`i';
    .store
    Save amp`i'.sav amp;
    Delete storage;
    .sort
#enddo

.end
vsht commented 3 months ago

This sounds like a great idea, many thanks!

Putting expressions into dollar variables doesn't seem to be a tricky task, right? This works out of the box and gives me exactly what I want

G amp1 = 1;
G amp2 = 2;
G amp3 = 3;
G amp4 = 4;
G amp5 = 5;
.sort

#do i=1,5
#$damp`i' = amp`i';
#enddo
.sort
#do i=1,5
G samp = $damp`i';
.store
save amp`i'.sav samp;
delete storage;
.sort
#enddo
.end
vsht commented 3 months ago

BTW, rather accidentally I noticed that writing (which is obviously wrong)

G samp = #$damp`i';

instead of

G samp = $damp`i';

makes FORM never terminate. Not sure if this should cause an error message or something.

tueda commented 3 months ago

If your expression is small enough to fit in $-variables (and thus in memory), then using a #do-loop is a handy way to split and save it:

S x1,...,x9;
.global
G F = (x1+...+x9)^2;
.sort

#define i "0"
#do t=F
  #redefine i "{`i'+1}"
  #message store `i': `t'
  G amp = `t';
  .store
  Save amp`i'.sav,amp;
  Delete storage;
  .sort
#enddo
.end
vsht commented 3 months ago

Thanks for the code sample! In my case I'm actually sorting the integrals according to their topologies and ids, so a simple looping over all terms of an expression wouldn't do the trick for me.

However, I also noticed that when assigning expressions to dollar variables in a loop, the process slows down significantly after a couple of hundred iterations. In the case at hand I have around 1K integrals in an expression, that were previously split into 1K expressions each containing one integral and a simple prefactor.

I don't know if this is expected due to some caches getting full or perhaps my code is just not well written.


#do i=1, `LSCLNTOPOLOGIES'
    #do j=1,$topoPresent`i'
        #do k=1,$topoIntegralCounter`i'
            #message Putting s1dia`lsclDiaNumber'L`lsclNLoops'T`LSCLTOPOLOGY`i''I`k'  into a dollar variable
            unhide s1dia`lsclDiaNumber'L`lsclNLoops'T`LSCLTOPOLOGY`i''I`k';            
            .sort
            #$ds1dia`lsclDiaNumber'L`lsclNLoops'T`LSCLTOPOLOGY`i''I`k' = s1dia`lsclDiaNumber'L`lsclNLoops'T`LSCLTOPOLOGY`i''I`k';
            .sort
        #enddo
    #enddo
#enddo
jodavies commented 3 months ago

You should be able to do this in a single module with no sorts. As soon as your expressions exceed scratchsize you are doing a large amount of unnecessary disk-to-disk copying. Maybe this is what is slowing things down after a few hundred iterations?

vsht commented 3 months ago

@jodavies You're right, mixing unhide and dollar variables in the same loop was a bad idea. This way everything goes through much faster:

#message lsclSplitAmplitude: Applying unhide

#do i=1, `LSCLNTOPOLOGIES'
    #do j=1,$topoPresent`i'
        #do k=`LSCLSTARTWITHINTEGRALNO',$topoIntegralCounter`i'
            unhide s1dia`lsclDiaNumber'L`lsclNLoops'T`LSCLTOPOLOGY`i''I`k';            
        #enddo
    #enddo
#enddo

.sort

#do i=1, `LSCLNTOPOLOGIES'
    #do j=1,$topoPresent`i'
        #do k=`LSCLSTARTWITHINTEGRALNO',$topoIntegralCounter`i'
            #message lsclSplitAmplitude: Putting s1dia`lsclDiaNumber'L`lsclNLoops'T`LSCLTOPOLOGY`i''I`k'  into a dollar variable
            #$ds1dia`lsclDiaNumber'L`lsclNLoops'T`LSCLTOPOLOGY`i''I`k' = s1dia`lsclDiaNumber'L`lsclNLoops'T`LSCLTOPOLOGY`i''I`k';
        #enddo
    #enddo
#enddo

.sort
tueda commented 2 months ago

By the way,

BTW, rather accidentally I noticed that writing (which is obviously wrong)

G samp = #$damp`i';

instead of

G samp = $damp`i';

makes FORM never terminate.

seems to be fixed in 8923c77. (# is the symbol used for complex conjugation.)