static-frame / arraykit

Python C Extensions for StaticFrame
Other
8 stars 2 forks source link

TypeBlock._blocks_to_array #46

Closed flexatone closed 1 year ago

flexatone commented 3 years ago

Used in Frame.values and all block consolidation routines.

flexatone commented 2 years ago

The approach here would be to take advantage of ordering and left to right assignment memcpy directly on the data pointer of the new array.

Reference implementation:

    @staticmethod
    def _blocks_to_array(*,
            blocks: tp.Sequence[np.ndarray],
            shape: tp.Tuple[int, int],
            row_dtype: tp.Optional[np.dtype],
            row_multiple: bool
            ) -> np.ndarray:
        '''
        Given blocks and a combined shape, return a consolidated 2D or 1D array.

        Args:
            shape: used in construting returned array; not ussed as a constraint.
            row_multiple: if False, a single row reduces to a 1D
        '''
        # assume column_multiple is True, as this routine is called after handling extraction of single columns
        if len(blocks) == 1:
            if not row_multiple:
                return row_1d_filter(blocks[0])
            return column_2d_filter(blocks[0])

        # get empty array and fill parts
        # NOTE: row_dtype may be None if an unfillable array; defaults to NP default
        if not row_multiple:
            # return 1 row TypeBlock as a 1D array with length equal to the number of columns
            array = np.empty(shape[1], dtype=row_dtype)
        else: # get ndim 2 shape array
            array = np.empty(shape, dtype=row_dtype)

        pos = 0
        array_ndim = array.ndim

        for block in blocks:
            block_ndim = block.ndim

            if block_ndim == 1:
                end = pos + 1
            else:
                end = pos + block.shape[1]

            if array_ndim == 1:
                array[pos: end] = block # gets a row from array
            else:
                if block_ndim == 1:
                    array[NULL_SLICE, pos] = block # a 1d array
                else:
                    array[NULL_SLICE, pos: end] = block # gets a row / row slice from array
            pos = end

        array.flags.writeable = False
        return array
flexatone commented 1 year ago

This function is now blocks_to_array_2d, which does some subtle things to handle various cases of generators, as well as optionally providing dtype and shape. Instead of implementing this, an optimized concatenation routine is probably preferable.