A chunked integer sequence

oscbyspro commented 1 year ago

I had a look at my extensions and I think MajorOrMinorLimbsSequence and its constituent sequences are useful enough to be made public (perhaps with better names). Splitting UInt(s) into Int16(s) or merging UInt8(s) into Int32(s) is quite common, but a headache, and these sequences are surprisingly great at it. I suppose they assume little endian order. I should fix that.

BigInt(words: MajorLimbs(littleEndianBytes, isSigned: true, as: UInt.self)) // no headache

The latest and greatest:

for word in NBKChunkedInt(source, isSigned: false, count: nil, as: UInt.self) { ... }
for byte in NBKChunkedInt(source, isSigned: false, count: nil, as: Int8.self) { ... }

oscbyspro commented 1 year ago

While there may be a better final solution, the ordering is easy to fix with (#72).

oscbyspro commented 1 year ago

It looks like I can't flatMap(_:) because capturing the merge/split order is too expensive.

oscbyspro commented 1 year ago

Hm. Should I remake these as collections?

oscbyspro commented 1 year ago

Also, is merge/split order important? Or are external transformations sufficient?

Assuming you have [1, 2, 3, 4], then you can form the following using only reverse (R):

typealias T = NBK.MajorOrMinorLimbsSequence
print(Array(T(([1, 2, 3, 4] as [Int32]), as: Int64.self))) // == 32,  64
print(Array(T(([2, 1, 4, 3] as [Int32]), as: Int64.self))) // == 32R, 64R
print(Array(T(([3, 4, 1, 2] as [Int32]), as: Int64.self))) // == 32,  64R
print(Array(T(([4, 3, 2, 1] as [Int32]), as: Int64.self))) // == 32R, 64

It also works in the large-to-small direction:

typealias T = NBK.MajorOrMinorLimbsSequence
print(Array(T(([0x0000000200000001, 0x0000000400000003] as [Int64]),            as: Int32.self)           ))
print(Array(T(([0x0000000200000001, 0x0000000400000003] as [Int64]).reversed(), as: Int32.self).reversed()))
print(Array(T(([0x0000000200000001, 0x0000000400000003] as [Int64]).reversed(), as: Int32.self)           ))
print(Array(T(([0x0000000200000001, 0x0000000400000003] as [Int64]),            as: Int32.self).reversed()))

[1, 2, 3, 4]
[2, 1, 4, 3]
[3, 4, 1, 2]
[4, 3, 2, 1]

oscbyspro commented 1 year ago

I mean, I have a working version of the above transformations but maybe it's over-engineered.

oscbyspro commented 1 year ago

Hm. These must conform to BidirectionalCollection for reversed() to apply lazily. Perhaps I should add conditional conformances so it still works with sequences? I'll have to think for a moment if that's the way I want to go about it.

oscbyspro commented 1 year ago

Hm. Should I remake these as collections?

Also, this. I haven't compared next() to an index-based design yet.

oscbyspro commented 1 year ago

A random access only design makes these sequences super simple, and (~50%) faster. Hm.

oscbyspro commented 1 year ago

Here's a thought: without sign extension, these can also be mutable. Although, perhaps sign extensions are necessary for converting UInt to UInt64? I suppose a flexible BigInt would not guarantee a snug fit.

oscbyspro commented 1 year ago

The random access design is like 10 lines each, plus lots of indexing boilerplate (#76) :(

oscbyspro commented 1 year ago

Hm. If go random access, then I should probably make them infinitely sign extendable.

oscbyspro commented 1 year ago

I added infinite sign extension, required random access and removed NBKMinorIntegerOfOne.

oscbyspro commented 1 year ago

An empty, signed, collection defaults to zero. The alternative is a T? or crashing.

oscbyspro commented 1 year ago

Given infinite sign extension, I can take the collection's count as an argument. In that way, you can truncate or extend the collection as it is passed to other generic algorithms. Hm.

oscbyspro commented 1 year ago

for word in NBKMajorOrMinorInteger(source, isSigned: false, count: nil, as: UInt.self) { ... }
for byte in NBKMajorOrMinorInteger(source, isSigned: false, count: nil, as: Int8.self) { ... }

oscbyspro commented 1 year ago

I've gone back and forth between wanting to merge the major and minor sequence into one and keeping them separate. Ultimately, I think that having a single sequence is better. It's fewer things to understand, and therefore less of a headache.

oscbyspro commented 1 year ago

Oh. Here's a neat idea. What about a macro that returns the major or minor sequence based on the size of the input and output element? It might be closer to what I want, because then I would not have to specialize unreachable paths or pray that the compiler removes them. But I suppose it wouldn't work in a generic function that converts unknown to unknown?

#MajorOrMinorInteger(...)

oscbyspro commented 1 year ago

Hm. Perhaps dynamic-but-constant-folded-when-specialized-branches is still the way to go.

oscbyspro commented 1 year ago

Here's a preconditioned namespace trick, although it's a bit arcane:

func majorOrMinorStuff() -> Stuff {
    Element.bitWidth > Base.Element.bitWidth ? Major().stuff() : Minor().stuff()
}

struct Major { init() { precondition(Element.bitWidth >= Base.Element.bitWidth) } }
struct Minor { init() { precondition(Element.bitWidth <= Base.Element.bitWidth) } }

oscbyspro commented 1 year ago

I renamed NBKMajorOrMinorInteger as NBKChunkedInt. It's not perfect, but it's better.

oscbyspro / Numberick

A chunked integer sequence #73