Performance improvements

FrankZijlstra commented 9 months ago

Replaced various calls to Numpy functions to native Python alternatives as this is much faster on scalar values and small lists (e.g. max(a,b) instead of np.max([a,b])).
Replaced np.cumsum calls that used small, newly constructed lists with a native Python cumsum, specialized for up to 5 values (the maximum use-case found in pypulseq).
Added caching of get_block to prevent expensive recalculations when blocks are repeatedly accessed.
Replaced string-type dictionary keys with tuples in event_lib and test_report. This also fixes a bug where event_lib.find_or_insert would not find an existing event because a different key was used.
Made event_lib.find use next_free_ID instead of recalculating this value every time.
Optimized compress_shape, add_gradients, and test_report.

Overall performance improvements range from modest to 3-4x faster, depending on the type of sequence, and whether additional functions like test_report are used.

The event_lib.find_or_insert bug means that the produced .seq files after this PR are not identical to dev, as duplicated events are now not there, changing event id's along with it. But other than that, as far as I could tell the changes do not break anything. It would be helpful to check this with the unit tests (should the sequence files produced by MATLAB be included in the repository to make this easier?).

schuenke commented 9 months ago

Thanks for these nice improvements @FrankZijlstra I will try to review the changes as fast as possible.

As you are becoming a pretty active developer, I would like to invite you to our pypulseq Slack channel (check https://github.com/imr-framework/pypulseq/discussions/97 for more info). Would be great if you would also check the other discussions about our planned refactoring (https://github.com/imr-framework/pypulseq/discussions/123, https://github.com/imr-framework/pypulseq/discussions/124).

FrankZijlstra commented 9 months ago

As you are becoming a pretty active developer, I would like to invite you to our pypulseq Slack channel (check #97 for more info). Would be great if you would also check the other discussions about our planned refactoring (#123, #124).

Would be happy to join, but I don't seem to be able to with the links in #97 (wrong email domain)

schuenke commented 9 months ago

Would be happy to join, but I don't seem to be able to with the links in #97 (wrong email domain)

maybe @sravan953 can post/send you an updated link !?

sravan953 commented 9 months ago

@FrankZijlstra Echoing @schuenke 's sentiment, thanks for all the improvements! We'd love for you to join the PyPulseq channel, please join the workspace here and I will add you to the channel -- https://join.slack.com/t/iarg-cmrrc/shared_invite/zt-1t3s2u8lz-1EJt3ufWxYvzEab0cZyfdA

btasdelen commented 9 months ago

Hi @FrankZijlstra,

Is there a reason why you are using tuple() instead of .tobytes() as in find_or_insert() function:

if not isinstance(new_data, np.ndarray):
    new_data = np.array(new_data)
data_string = new_data.tobytes()

AFAIK, directly dumping the bytes of the array should be much faster than conversion to tuple.

FrankZijlstra commented 9 months ago

@btasdelen Interesting, you are quite right that tobytes is indeed faster. The reason I changed everything to tuples is that the keys used in insert, find_and_insert, and update need to be the same, and the old code in insert (converting floats to strings) was really slow. I did just now implement the obvious alternative by making sure everything is a np.array and then using tobytes as keys, and that is faster.

However, there is one more thing I noticed the other day: for all except the shape_library the data inserted are small, newly constructed arrays. For example in register_grad_event:

data = [amplitude, *shape_IDs, event.delay, event.first, event.last]

and

data = np.array(
            [
                event.amplitude,
                event.rise_time,
                event.flat_time,
                event.fall_time,
                event.delay,
            ]
        )

So we're constructing a list, converting it to np.array, and then using tobytes on that to insert it in the library. By changing all these to be constructed as tuples in the first place, a lot of conversions can be avoided. And the way the libraries are used, it does not seem to care that the data inserted is a tuple instead of a np.array (all access to library data is using straight indexing, i.e. library.data[key][1]).

The only problem with this is the shape library, which does insert np.arrays, but this could easily be made into a special case that does use the tobytes keys. Either way, I'll commit the alternative version using tobytes for now.

FrankZijlstra commented 9 months ago

I added some more things to the PR, sorry for piling things on (but I checked with @schuenke that this was okay :) ).

I reworked the event_lib to allow choosing between tuple or tobytes keys (latter only used for shape_library), and made sure all calls that insert into the event libraries use tuples whenever possible to prevent conversions/copies. As an added benefit, tuples are immutable, so you can't accidentally modify a library. To be consistent, I set the writeable flag of all numpy data in the library to false.

Other smaller improvements:

Changed a np.max I missed before
Optimized the timestep check in register_rf_event
Removed some unnecessary np.array conversions
Cleaned up block_to_events to prevent making new lists every call to set_block (which will almost always be with a list of events, not a block). The hasattr(args[0], 'rf') check is consistent with the MATLAB version of this function.

btasdelen commented 9 months ago

I tested these commits with a couple of more complicated sequences. I did not observe any regressions. I am merging before the PR gets even bigger :)

imr-framework / pypulseq

Performance improvements #133