taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.57k stars 2.29k forks source link

Polish and officially release quantized types #4857

Open strongoier opened 2 years ago

strongoier commented 2 years ago

Quantized types are an experimental feature introduced in the QuanTaichi paper. With this useful feature, users can significantly save memory usage of their Taichi programs. The feature can also enable acceleration of atomic operations on mobile phones.

However, the feature has been neither officially announced nor extensively maintained. As Taichi has come to its 1.0 version, I think it is time to polish the feature and make it available to users. My plan is to refine the API and implementation so that it can fit into current Taichi better, be more user-friendly, and become deployable with Taichi AOT. I would like to write an RFC for it.

strongoier commented 2 years ago

Before writing a formal RFC, I would like to briefly summarize some previous discussions on this topic. I think there are still some issues to be solved, and I hope to continue the discussion here.

Background

A quantized type normally has no native support. Therefore, you need to specify a parent primitive type (e.g. a 32-bit int) and describe how you would like to pack a group of quantized types (e.g. a 15-bit int and a 17-bit int) inside.

In Taichi, this is done by introducing two SNode types, bit_struct and bit_array. Example usages:

i4 = ti.quant.int(bits=4)
u28 = ti.quant.int(bits=28, signed=False)

p = ti.field(dtype=i4)
q = ti.field(dtype=u28)
ti.root.dense(ti.i, 4).bit_struct(num_bits=32).place(p, q)

r = ti.field(dtype=i4)
ti.root.dense(ti.i, 4).bit_array(ti.i, 8, num_bits=32).place(r)

What are the problems of current APIs?

  1. bit_struct and bit_array are not consistent with other SNode types. A normal SNode specifies two things: how to split the axes, and how are cells stored in the container. Meanwhile, a normal SNode has no limitations on components of its cells. However, bit_struct has nothing to do with axes, and both bit_struct and bit_array must only have place SNodes as components of its cells with limitation on total number of bits of all components of its cells. These make the APIs inconsistent.
  2. Users cannot use quantized types outside the SNode system. This is especially problematic when it comes to deployment, because ndarrays, which are first-class citizens in Taichi AOT, cannot work with quantized types.

What are our current thoughts on solving the problems?

As bit_array is deeply coupled with the SNode system (it indeed handles axes splitting) while used not that often, we prefer to keep it unchanged. Our main focus is around bit_struct.

Potential change 1: add type ti.types.bit_struct

ti.types.bit_struct is similar to ti.types.struct, with the following differences:

Example usage:

s_ty = ti.types.bit_struct(32, {'a': i4, 'b': u28})
s = ti.field(dtype=s_ty)
ti.root.dense(ti.i, 4).place(s)
s[I].a, s[I].b  # access

s_arr = ti.ndarray(dtype=s_ty, shape=4)
s_arr[I].a, s_arr[I].b  # access

Pros:

Cons:

Potential change 2: add helper function bit_struct_wrapper()

bit_struct_wrapper() is introduced to replace the bit_struct SNode. Example usage:

p = ti.field(dtype=i4)
q = ti.field(dtype=u28)
ti.root.dense(ti.i, 4).place(bit_struct_wrapper(32, [p, q]))

It aims at solving problem 1 without sacrificing anything. However, it can do nothing with problem 2 because it is not compatible with ndarrays.

Considering that none of these proposed changes is perfect, shall we apply none, one, or both of them? Or do you have other ideas? @k-ye @ailzhang @yuanming-hu

k-ye commented 2 years ago

I really like 1, because it makes the type system neat :-) However, considering that changing to bit_struct_wrapper should be easier, and that 1 and 2 are not mutually exclusive, I think it's reasonable to go with 2 first. As for ndarray quant type, at the bare minimum we can support storing just fixed-point scalar number first, then quantized vector types, then quantized struct types.

ailzhang commented 2 years ago

+1 on implementing #2 as a start! Btw I feel like based on the deployment need, not modifying computation code might not be as hard requirement as we thought. IMHO if it's a s/old/new it shouldn't be a huge problem for people who want to maximize performance. (or is it more complicated than that? :P For ndarray + quant, is it correct understanding that supporting fixed-point scalar number can already solve our problem of floating point atomics?

strongoier commented 2 years ago

not modifying computation code might not be as hard requirement as we thought

I agree with that. The main point here is, when we introduce a new language construct, especially as fundamental as a type, we should let it make sense in most cases, instead of being a deployment-only thing.

For ndarray + quant, is it correct understanding that supporting fixed-point scalar number can already solve our problem of floating point atomics?

IMO yes. @k-ye

strongoier commented 2 years ago

After an offline discussion with @ailzhang, we reach the following consensus:

  1. Ndarray is designed mainly for deployment purposes, with the two unique advantages - avoiding memory copy and recompilation. It is important that it can be interpreted by user programs and common third-party frameworks in a trivial way. Therefore, supporting complex data storage mechanism with ndarrays doesn't make much sense. For cases where those complex storage is really needed, users should refer to the SNode AOT solution.
  2. That said, we still want to solve the problem that floating point atomics on mobile phones are too slow, without the SNode AOT solution. The proposal here is to add fixed32 and fixed64 types, and let users convert float from/to them:
    
    f_ty = ti.types.fixed32(scale=100.0, signed=False)
    arr = ti.ndarray(float, 10)

@ti.kernel def foo(a: ti.types.ndarray()): for i in a: x = ti.cast(a[i], f_ty) ... # calculations on x a[i] = ti.cast(x, float)

foo(arr)



WDYT @k-ye
If this solution looks good, we can finalize this as an individual feature request and then re-consider other aspects of quant APIs with fewer restrictions.
k-ye commented 2 years ago

To me it seems like fixed32 is just a small wrapper around the quant API? While I agree that

it can be interpreted by user programs and common third-party frameworks in a trivial way.

it's also not too hard to convert custom quant types into primitive types.

One thing I've been thinking about: if we make quant vectors workable on mobile, how much larger scale can we get for simulation. Note that graphical APIs are already offering f16 vectors, e.g. Metal has half4, so this is something to consider.

So yeah, I guess we can agree that Ndarray doesn't need to support fancy bit_struct. But I think it's reasonable to consider quantized scalars and vectors/matrices.

ailzhang commented 2 years ago

@k-ye Yup that wrapper is mainly used to solve the floating point atomics problem we've seen.

Is it correct understanding that to achieve much larger scale simulation on mobile, we can try adding e.g. half4 as primitive type which applies to both field and ndarray?

k-ye commented 2 years ago

Is it correct understanding that to achieve much larger scale simulation on mobile, we can try adding e.g. half4 as primitive type which applies to both field and ndarray?

Yep. Additionally, this could also help with vec4 loading optimization (cc @turbo0628 @qiao-bo )

strongoier commented 2 years ago

To me it seems like fixed32 is just a small wrapper around the quant API?

This indeed requires us to support quantized scalars. Our current APIs cannot be used outside SNodes. However, when a quant type is used as an individual scalar, number of bits other than 32/64 doesn't make sense. As there are already f32/f64, the only meaningful types to provide are fixed32/64.

Additionally, this could also help with vec4 loading optimization

I don't quite get the point here. Using native half4/vec4 in codegen instead of current ad-hoc expansion will certainly be an optimization strategy for our ti.types.vector(4, dtype=f16/f32). How does it relate to our quantized types?

strongoier commented 2 years ago

After yet another discussion with @k-ye @ailzhang @jim19930609, I have formed a mental picture of future plans and would like to share it here.

Task A: Refine current APIs of quantized types and make them available again

Although current APIs work only in the SNode system, they are still useful and we hope to expose them in a cleaner way.

Subtask A.1: Determine public APIs of quantized type definitions

Previously, we have two groups of APIs, type_factory and quant. The latter is built on top of the former, and is used in the QuanTaichi paper. However, in some real use cases the former is adopted. Having both adds unnecessary burden for users to learn these APIs.

We would like to only keep quant as it is closer to users, and make it available at ti.types.quant for consistency with other types. type_factory will be removed, and its methods will be made private under ti.types.quant.

To sum up, we will have ti.types.quant.int/fixed/float/_custom_int/_custom_float. All current usages need to be updated.

Subtask A.2: Solve the inconsistency problem of bit_struct SNode

This corresponds to problem 1 and potential change 2 mentioned above. I plan to add an API ti.bit_struct_wrapper(number_of_bits, list_of_fields, with_shared_exponent) to solve the inconsistency problem and also make place() clean. This requires refactoring our SNode system implementation a bit as we are getting rid of the bit_struct SNode.

Task B: Add new all-purpose and deployable APIs of quantized types

For deployment purposes, where performance is valued the most, it is worth providing some new APIs (users have to write things in a new way). The new APIs should work both in the SNode system and for Ndarrays.

Subtask B.1: Allow unrestricted usage of quantized types as dtype

Currently, quantized types ti.types.quant.int/float/fixed can serve as dtype of fields, with the condition that they are placed as a bit_struct or bit_array. We hope to allow direct usage of them as dtype with no limitations, so that they can also be used in Ndarrays and thus easily deployable. Note that in this case, we need to pad a quantized type to a primitive type with minimum number of bits for storage purposes.

You may wonder what is the use case, considering that no memory can be saved. In fact, the above support is mainly targeting acceleration of atomic operations on mobile phones, by replacing float32 with 32-bit fixed point numbers. Meanwhile, it enables experimenting with different precisions and provides basis for subsequent tasks.

Subtask B.2: Add a quantized vector type

To enable the main advantage, saving memory, of quantized types, we hope to add a quantized vector type ti.types.quant.vector(n, dtype), where dtype must be one of ti.types.quant.int/float/fixed. The whole type will be padded to a primitive type with minimum number of bits that can hold n dtype. This targets common cases like packing two or three components of some physical quantities together.

Subtask B.3 (optional): Add a quantized struct type

Similar to Subtask B.2, we can add a quantized struct type ti.types.quant.struct, which was previously mentioned as ti.types.bit_struct. This can be an optional task when real need arises.

Task C: Add documentation and examples for quantized types

After this step we can have an official announcement of the rebirth of quantized types!

ailzhang commented 2 years ago

minor nit: for subtask b.2, I wonder if ti.types.vector(n, dtype) where ti.types.quant.int/float/fixed are added to the whitelist of dtype makes it simpler for users?

k-ye commented 2 years ago

Thanks for writing this up! Overall it looks like a great roadmap. I have a few questions here:

Could you provide an overview of the quant API?


Subtask A.1

I plan to add an API ti.bit_struct_wrapper(number_of_bits, list_of_fields, with_shared_exponent) to solve the inconsistency problem

I wonder if with_shared_exponent is only meaningful for vector/matrix types?

Subtask A.2

type_factory will remain as an internal API at ti.types.quantized_types.type_factory

nit: I feel like we don't have to have both ti.types.quant and ti.types.quantized_types. Maybe just ti.types.quant.type_factory?

strongoier commented 2 years ago

minor nit: for subtask b.2, I wonder if ti.types.vector(n, dtype) where ti.types.quant.int/float/fixed are added to the whitelist of dtype makes it simpler for users?

ti.types.vector and ti.types.quant.vector are different in many ways. ti.types.quant.vector is actually stored as a primitive type, has limitations on number of bits, and can accept quant-only configurations like with_shared_exponent.

I wonder if with_shared_exponent is only meaningful for vector/matrix types?

For struct it can make sense as well..

nit: I feel like we don't have to have both ti.types.quant and ti.types.quantized_types. Maybe just ti.types.quant.type_factory?

ti.types.quant is the actual API we want to expose. As type_factory is hidden, we have to visit the whole module path ti.types.quantized_types for internal usage.

k-ye commented 2 years ago

we have to visit the whole module path ti.types.quantized_types for internal usage.

I think there are different ways to handle this: use __all__ to control the public symbols, use quant._type_factory, etc.

strongoier commented 2 years ago

I think there are different ways to handle this: use __all__ to control the public symbols, use quant._type_factory, etc.

Ah yes. I was stuck at the assumption that we could not break the two same-level classes, quant and type_factory . However it is now a chance to refine things more aggressively.

Now I have a new design: we get rid of the legacy "type_factory" and directly provide the following APIs - ti.types.quant.int/fixed/float/_custom_int/_custom_float. WDYT @k-ye

BTW which one seems better, quant.int or quant_int?

k-ye commented 2 years ago

Cool! I prefer quant.int more, as they can be scoped in the same namespace quant. WDYT? (cc @ailzhang @jim19930609 )

ailzhang commented 2 years ago

+1 on quant.int!

yuanming-hu commented 2 years ago

I wonder if with_shared_exponent is only meaningful for vector/matrix types?

For struct it can make sense as well..

I feel like in real use cases shared exponents are typically used only in vectors. Do you have an example where you need that in a struct? :-) @strongoier

Another question of mine: if I'd split the 64 bits into x: fixed21, y: fixed22, z: fixed21, can it be expressed as a quantized vector3? See also the RGB565 format in OpenGL etc.: https://www.khronos.org/opengl/wiki/Image_Format

strongoier commented 2 years ago

I feel like in real use cases shared exponents are typically used only in vectors. Do you have an example where you need that in a struct? :-)

Not really. My point here is just that we don't have to throw an error if those fields are not grouped as a vector.

Another question of mine: if I'd split the 64 bits into x: fixed21, y: fixed22, z: fixed21, can it be expressed as a quantized vector3?

In fact we hope that elements of a vector have the same type. A quantized struct is needed for this purpose.

yuanming-hu commented 2 years ago

In fact we hope that elements of a vector have the same type. A quantized struct is needed for this purpose.

I see. Thanks for the clarification!

I feel like the user may want to access the components via [] - for example color = (fixed5, fixed6, fixed5) and the user writes luminance = a[0] + a[1] + a[2]. Do we plan to support that? :-)

strongoier commented 2 years ago

I feel like the user may want to access the components via [] - for example color = (fixed5, fixed6, fixed5) and the user writes luminance = a[0] + a[1] + a[2]. Do we plan to support that? :-)

Yep. It is fine to support that as syntax sugar.

yuanming-hu commented 2 years ago

Yep. It is fine to support that as syntax sugar.

I'm thinking about this: for ti.types.quant.vector(n, dtype), can dtype be a list of quantized types? For example, we may want to allow something like rgb565 = ti.types.quant.vector(3, [fixed5, fixed6, fixed5]) :-) Then it's not simply a syntax sugar, but a real vector type. (Are we worrying about dynamic indexing here?)

strongoier commented 2 years ago

I'm thinking about this: for ti.types.quant.vector(n, dtype), can dtype be a list of quantized types? For example, we may want to allow something like rgb565 = ti.types.quant.vector(3, [fixed5, fixed6, fixed5]) :-) Then it's not simply a syntax sugar, but a real vector type. (Are we worrying about dynamic indexing here?)

I understand your point here. TBH this touches some underlying design philosophy of Taichi, which I get a bit confused from time to time.

As far as I understand, in earlier Taichi a vector is a pure math concept. It promises math operations, but nothing about storage. Because of this, it has great flexibility, allowing components to be non-contiguous, and to have different types. Also because of this, it cannot be directly mapped to native vector types, and cannot support dynamic indexing perfectly.

As time goes by, different voices arise in the community. Many users consider vectors as containers of contiguous same-typed values. As a result, many recent or planned efforts go in this direction - dynamic indexing, native types, etc.

However, these two directions are inherently conflicting - giving more support to one of them means giving less support to the other. To avoid getting design choices back and forth, IMHO we need to have a consistent and clear underlying principle. Then we can easily determine whether a quantized vector can have components with different types.

BTW I have another question: why do we have a struct type in the presence of a vector type which can have components with different types?

k-ye commented 2 years ago

To avoid getting design choices back and forth, IMHO we need to have a consistent and clear underlying principle.

I also agree on this. We have spent some great amount of time debating on this, and concluded that vector/matrix should behave just like how most users would expect: They are containers holding homogeneous elements, dynamically-indexable, and providing linalg methods. Most of the time, using a Taichi vector/matrix should feel no different from using a GLM/GLSL one. It simplifies the user experience, the API design and the implementation.

If it comes to a point where a non-trivial amount of usage for heterogeneous-vector show up, 1) From a storage point of view, this could supposedly be implemented via quant structs; and 2) we should consider how to offer a proxy/adaptor to help them convert between this quant struct and vectors (in the mathematical sense). WDYT?

yuanming-hu commented 2 years ago

However, these two directions are inherently conflicting

Sorry about the confusion. I don't think the two directions are conflicting actually - let me write down a bit more details.

As far as I understand, in earlier Taichi a vector is a pure math concept. It promises math operations, but nothing about storage. Because of this, it has great flexibility, allowing components to be non-contiguous, and to have different types. Also because of this, it cannot be directly mapped to native vector types, and cannot support dynamic indexing perfectly.

I feel like you are mixing global (field) and local vectors. Local vectors are indeed purely math concept, and it says nothing about storage/data layout. In fact, they are always stored on the stack/register file. Local vectors can easily support dynamic indexing.

Global "vectors" are used to specify storage/quantization. For most of the computation, you convert global vectors to local vectors - the conversion involves loading/storing, as well as decoding/encoding for quantized types.

Then we can easily determine whether a quantized vector can have components with different types.

Perhaps the point is the components can have different (quantized) storage types, but they must share the same compute type? This ensures when loading them you get a formal float32x3/float64x3 etc.

BTW I have another question: why do we have a struct type in the presence of a vector type which can have components with different types?

You still need struct since you may have quantized int and quantized float in the same quant struct :-) It's more about "compute_type" in the QuanTaichi paper.

yuanming-hu commented 2 years ago

I also agree on this. We have spent some great amount of time debating on this, and concluded that vector/matrix should behave just like how most users would expect: They are containers holding homogeneous elements, dynamically-indexable, and providing linalg methods. Most of the time, using a Taichi vector/matrix should feel no different from using a GLM/GLSL one. It simplifies the user experience, the API design and the implementation.

I totally agree with this. In fact, in the future, we should simply reuse the GLSL vector/matrix operators in the codegen :-)

The rgb565 type should be decoded into a vec3 (float32x3, if the compute_type for the three quantized components is float32) for computation, similar to the imageLoad function in GLSL.

yuanming-hu commented 2 years ago

If it comes to a point where a non-trivial amount of usage for heterogeneous-vector show up, 1) From a storage point of view, this could supposedly be implemented via quant structs; and 2) we should consider how to offer a proxy/adaptor to help them convert between this quant struct and vectors (in the mathematical sense). WDYT?

Just to clarify: I don't think we should support "heterogeneous-vector" that contains both float and int, or float32 and float64. That's against most users' common practice and is against our recent attempt to support dynamic indexing and native types. I can't come up with a typical use case where you need a vector composed with both int and float.

But I do feel like we should allow different components of a homogeneous vector to be stored as different quantized types, since such usage is common in graphics (e.g., the RGB565 format). The price you have to pay though, is when you load/store such vectors, you always have to load/store them as a whole, instead of loading a single component. I believe paying such price is no big deal in practice :-)

(Sorry about joining this discussion late. Most of the thread makes a lot of sense to me. The only thing that I hold a different opinion is that "we should consider allowing different components of a homogeneous vector to be stored as different quantized types")

k-ye commented 2 years ago

The rgb565 type should be decoded into a vec3

Yep. To support my argument, rgb565 is-a quantized struct, rather than a quantized vector. And +1 that it will be decoded/converted to a regular vec3.

I believe what @strongoier meant in "these two directions are inherently conflicting... we need to have a consistent and clear underlying principle." is also this point... It is conflicting in the sense that rgb565 itself is only a storage type, and shouldn't be used for computing directly. Before participating any kind of computation, it will need to first go through this decoding stage into a mathematically-legit vector. I think this principle is where we don't have a consensus yet, i.e., vectors should be treated in the purely mathematical way, and should not take much responsibility in fancy storage patterns. To make the quantized type work like a vector, Taichi or the users will need to convert them first.

yuanming-hu commented 2 years ago

It is conflicting in the sense that rgb565 itself is only a storage type, and shouldn't be used for computing directly.

Exactly. (The only exception is when you want to perform rgb565 + rgb565 using u16 operator +. I assume that is a rare use case.)

You need to either associate a compute_type (e.g., ti.types.vec3) with rgb565, or explicitly let the user do rgb565_array[I, j, k].decode(ti.types.vec3).

I think this principle is where we don't have a consensus yet, i.e., vectors should be treated in the purely mathematical way, and should not take much responsibility in fancy storage patterns. To make the quantized type work like a vector, Taichi or the users will need to convert them first.

I agree with this. Perhaps ti.types.vector is for both computation & storage (since you need AOS/SOA/...), and ti.types.quant.vector is only for (AOS) storage? I can't easily come up with a use case where ti.types.quant.vector needs SOA so I assume it's AOS only.

Yep. To support my argument, rgb565 is-a quantized struct, rather than a quantized vector. And +1 that it will be decoded/converted to a regular vec3.

What confuses me here: if this holds true, isn't ti.types.quant.vector a special case of ti.types.quant.struct? And it sounds like we will need two code paths for ti.types.quant.struct with homogenous and inhomogeneous components, the former automatically/optionally converted into a vector but the later constantly stays a struct.

strongoier commented 2 years ago

Let me do a quick summary (which contains some personal ideas, though).

  1. Local vectors are always homogeneous, and should be translated to native types if possible, which can naturally provide dynamic indexing support.
  2. ti.types.quant.vector should be loaded into a local vector before doing any calculation. The result should also be stored as a whole.
  3. ti.types.quant.vector can accept its components to have different quant types (e.g. dtype=[fixed5, fixed6, fixed5]), but they should have the same compute type. This will be checked upon type definition.
  4. ti.types.quant.vector can also take an optional physical_type if users don't want the automatically inferred one. This is mainly useful for ti.types.quant.matrix, which may not fit into one primitive type. In this case, users may want to manually specify if they want 32 bits or 64 bits as a storage unit.
k-ye commented 2 years ago

This is mainly useful for ti.types.quant.matrix, which may not fit into one primitive type. In this case, users may want to manually specify if they want 32 bits or 64 bits as a storage unit.

I wonder if physical_type has to fit in 64 bits?

strongoier commented 2 years ago

I wonder if physical_type has to fit in 64 bits?

I'm not sure if physical_type is a good name. It simply allows users to choose between 8 / 16 / 32 / 64 bits. For ti.types.quant.matrix(3, 3, [[f11, f11, f10], [f11, f11, f10], [f11, f11, f10]]), users can choose to use 2 units of 64 bits or 3 units of 32 bits to store it.

strongoier commented 2 years ago

There are a few more things to decide about the public APIs of quantized type definitions.

  1. Currently, for ti.types.quant.int, the default value of signed is False. However, for ti.types.quant.fixed/float, that value is True. The inconsistency may confuse users. I suggest making all of them True, considering that in Python numerical types are signed by default.
  2. There are some usages of the internal API ti.types.quant._custom_float, aiming for directly appointing scale, instead of calculating scale = range / 2**frac with the public API ti.types.quant.fixed. I guess the intention is to avoid floating point errors on the division. Shall we also add a scale parameter to ti.types.quant.fixed? Then users no longer have to use the internal API for the mentioned purpose.

WDYT @yuanming-hu @k-ye @ailzhang

yuanming-hu commented 2 years ago

Currently, for ti.types.quant.int, the default value of signed is False. However, for ti.types.quant.fixed/float, that value is True. The inconsistency may confuse users. I suggest making all of them True, considering that in Python numerical types are signed by default.

Sounds good to me!

There are some usages of the internal API ti.types.quant._custom_float, aiming for directly appointing scale, instead of calculating scale = range / 2**frac with the public API ti.types.quant.fixed. I guess the intention is to avoid floating point errors on the division. Shall we also add a scale parameter to ti.types.quant.fixed? Then users no longer have to use the internal API for the mentioned purpose.

Could you point me to a specific use case of using scale for quantized floating-point? That's probably not to avoid float point errors. The intention could be to shift the dynamic range. E.g., from 1e-5~1e5 to 1e-3~1e7.

strongoier commented 2 years ago

Could you point me to a specific use case of using scale for quantized floating-point?

Well I didn't find it in quantaichi or taichi_elements repos. These repos only contain usage like https://github.com/taichi-dev/quantaichi/blob/809f7c6e6cc3c9d446a184e94f1ff733d5bcd7b4/mls_mpm/benchmark/quan_mpm_benchmark.py#L235-L237, which can be clearly replaced with ti.types.quant.fixed.

However, there are some interesting use cases in tests, e.g., https://github.com/taichi-dev/taichi/blob/a0a805972d1b73f28cebcf45591e6514c8b5989e/tests/python/test_custom_type_atomics.py#L69-L91

I'm not sure if this is only for testing purposes :-)

yuanming-hu commented 2 years ago

I see. For fixed points, it's fine to leave both range and scale in the API :-)

(The original _custom_float API is misleading. It's actually a fixed-point number)

strongoier commented 2 years ago

While fixing #5009, I found a tricky use case that we previously missed - a BitStruct SNode can actually be read out as a whole, with its physical type: https://github.com/taichi-dev/taichi/blob/fba92cf76f93668033678e91eb219ba9c9f4a1ef/misc/visualize_quant_types.py#L48-L50

With the new proposal, where bit_struct_wrapper is used to group quant fields together, such use case may not be allowed. TBH I feel like such use case is mainly for debugging the internal implementation, and it should be OK to disallow it. WDYT @yuanming-hu @k-ye @ailzhang

yuanming-hu commented 2 years ago

We can simply disallow it :-)

strongoier commented 2 years ago

After yet another discussion with @Hanke98 @k-ye @ailzhang, I would like to share some updates.

In Taichi v1.1, we hope to first release a refined version of quantized types used with SNodes, in order to get this feature officially announced and tried by users. The TODO list of this whole issue is thus shuffled by priority, and I'll track v1.1 blockers here:

Quantized types definition refinement plan

We only present three basic quantized types, ti.types.quant.int/fixed/float, to users.

bit_struct SNode refinement plan

Let me illustrate the API change with the following example. It is indeed an altered version of ti.bit_struct_wrapper() proposed in https://github.com/taichi-dev/taichi/issues/4857#issuecomment-1122134432, with a more natural way to express shared exponents.

Common part:

u4 = ti.types.quant.int(bits=4, signed=False)
f15 = ti.types.quant.float(exp=5, frac=10)
f18 = ti.types.quant.float(exp=5, frac=13)

p = ti.field(dtype=u4)
q = ti.field(dtype=f15)
r = ti.field(dtype=f18)

Old API:

blk = ti.root.dense(ti.i, 4).bit_struct(num_bits=32)
blk.place(p)
blk.place(q, r, shared_exponent=True)

Previous proposal:

ti.root.dense(ti.i, 4).place(ti.bit_struct_wrapper(32, [p, [q, r]]))

New API:

bitpack = ti.BitpackedFields(max_num_bits=32)
bitpack.place(p)
bitpack.place(q, r, shared_exponent=True)
ti.root.dense(ti.i, 4).place(bitpack)

bit_array SNode refinement plan

  1. bit_array will remain a SNode, but will be renamed to quant_array, considering that bit_array is usually used to refer to 0/1 arrays.
  2. bit_vectorize is currently a ad-hoc configuration function specifying how many bits are vectorized together, which leads to confusing API explanations like bit_vectorize(1) means off while bit_vectorize(32) means on. As the vectorization unit is always the physical type, bit_vectorize will be turned into an on/off switch inside a loop config.
  3. The current implementation of bit_vectorize only applies to 0/1 arrays, which should be turned off by default. Struct fors on quant_arrays with bit_vectorize off should work properly.
  4. ti.types.quant.fixed should be supported as elements in quant_arrays.

Documentation and examples

We need to write a tutorial about using quantized types, based on what we already have in taichi_elements and quantaichi. These repos should also be updated with latest APIs.

k-ye commented 2 years ago

Just one comment, I wonder if we can put BitpackedFields under ti.quant as well :-)

strongoier commented 2 years ago

Just one comment, I wonder if we can put BitpackedFields under ti.quant as well :-)

Unfortunately, we only have a ti.types.quant module for type definitions, which may not be suitable for BitpackedFields...

strongoier commented 2 years ago

Let me share some updates here. All planned refinement in https://github.com/taichi-dev/taichi/issues/4857#issuecomment-1173647894 has been realized and announced to users in Taichi v1.1. Furthermore, current codegen regarding quantized types is independent of SNode now, which allows flexible extensions in the future. A potential future direction is to allow quantized types in Ndarrays (similar to https://github.com/taichi-dev/taichi/issues/4857#issuecomment-1122134432 task B), which will get implemented when more real requirements arise.