static-frame / arraykit

Python C Extensions for StaticFrame
Other
8 stars 2 forks source link

Burkland/split up main source file #170

Closed chaburkland closed 4 months ago

chaburkland commented 4 months ago

A first pass at splitting up _arraykit.c into multiple source files.

I split it up as such:

  1. utilities.c: defines internal arraykit functions/macros that are used by other functions across multiple modules
  2. methods.c: defines the public arraykit functions
  3. block_index.c: defines all BlockIndex related code
  4. array_go.c: defines all ArrayGO related code
  5. file_parsing.c: defines all code related to delimited file parsing
  6. tri_map.c: defines all TriMap related code
  7. _arraykit: defines the arraykit public interface

All source files, except _arraykit.c have related header files.

A main note Many functions are no longer static or static inline. Through splitting this code up, I gained a deeper understanding of what those keywords mean.

static means the function defined is specific to the scope of the source file it lives in. As such, it is non-sensical to define a static function in a header file, because it is essentially defining an interface that cannot ever be referenced outside the header file itself. As such, the public functions/typedefs in each of the header files are not static, and their corresponding definitions in the source files are also not static.

inline is a way to speed up performance at the cost of binary size. With inlining all possible functions, the binary sizes (on my machine) increases from: 841,184 KBs -> 1,041,664 KBs I'm testing the performance diff now

flexatone commented 4 months ago

@chaburkland : very happy to see this refactoring!

One initial comment: I would favor leaving in the existing inline usage for a number of reasons.

flexatone commented 4 months ago

@chaburkland : one more:

Can we name file_parsing.c something more specific like delimited_to_arrays.c? I guess there might be a name conflict, but a more specific name seems appropriate as, after this change, we are likely to create new, separate files for any other file parsing routines we might add.

chaburkland commented 4 months ago

@flexatone

Thanks for the push to rename. I wasn't sure what to call it, but delimited_to_arrays makes the most sense.

As for inline, I went back through and discovered I marked a lot more functions inline than previously.

I wasn't able to mark these 4 functions inline (they were previously), because they are defined in utilites.h:

flexatone commented 4 months ago

I think (at least an AI told me) if we put the actual implementation (not just the header) in the .h file, we can inline some of those remaining functions, at least the smaller ones more natural for inlining.

flexatone commented 4 months ago

@chaburkland : any thoughts on the above (putting implementations in header files for inlining)?

Also, do you think we should merge this before merging https://github.com/static-frame/arraykit/pull/160, or vice versa?