Open strongoier opened 2 years ago
Before writing a formal RFC, I would like to briefly summarize some previous discussions on this topic. I think there are still some issues to be solved, and I hope to continue the discussion here.
A quantized type normally has no native support. Therefore, you need to specify a parent primitive type (e.g. a 32-bit int) and describe how you would like to pack a group of quantized types (e.g. a 15-bit int and a 17-bit int) inside.
In Taichi, this is done by introducing two SNode types, bit_struct
and bit_array
. Example usages:
i4 = ti.quant.int(bits=4)
u28 = ti.quant.int(bits=28, signed=False)
p = ti.field(dtype=i4)
q = ti.field(dtype=u28)
ti.root.dense(ti.i, 4).bit_struct(num_bits=32).place(p, q)
r = ti.field(dtype=i4)
ti.root.dense(ti.i, 4).bit_array(ti.i, 8, num_bits=32).place(r)
bit_struct
and bit_array
are not consistent with other SNode types. A normal SNode specifies two things: how to split the axes, and how are cells stored in the container. Meanwhile, a normal SNode has no limitations on components of its cells. However, bit_struct
has nothing to do with axes, and both bit_struct
and bit_array
must only have place
SNodes as components of its cells with limitation on total number of bits of all components of its cells. These make the APIs inconsistent.As bit_array
is deeply coupled with the SNode system (it indeed handles axes splitting) while used not that often, we prefer to keep it unchanged. Our main focus is around bit_struct
.
ti.types.bit_struct
ti.types.bit_struct
is similar to ti.types.struct
, with the following differences:
ti.types.bit_struct
is stored with a primitive type.ti.types.bit_struct
must be quantized types.Example usage:
s_ty = ti.types.bit_struct(32, {'a': i4, 'b': u28})
s = ti.field(dtype=s_ty)
ti.root.dense(ti.i, 4).place(s)
s[I].a, s[I].b # access
s_arr = ti.ndarray(dtype=s_ty, shape=4)
s_arr[I].a, s_arr[I].b # access
Pros:
bit_struct
SNode.Cons:
ti.types.bit_struct
focuses on storage, so its members may not be a logical group. This can result in hard-to-read user programs.bit_struct_wrapper()
bit_struct_wrapper()
is introduced to replace the bit_struct
SNode. Example usage:
p = ti.field(dtype=i4)
q = ti.field(dtype=u28)
ti.root.dense(ti.i, 4).place(bit_struct_wrapper(32, [p, q]))
It aims at solving problem 1 without sacrificing anything. However, it can do nothing with problem 2 because it is not compatible with ndarrays.
Considering that none of these proposed changes is perfect, shall we apply none, one, or both of them? Or do you have other ideas? @k-ye @ailzhang @yuanming-hu
I really like 1, because it makes the type system neat :-) However, considering that changing to bit_struct_wrapper
should be easier, and that 1 and 2 are not mutually exclusive, I think it's reasonable to go with 2 first. As for ndarray quant type, at the bare minimum we can support storing just fixed-point scalar number first, then quantized vector types, then quantized struct types.
+1 on implementing #2 as a start! Btw I feel like based on the deployment need, not modifying computation code might not be as hard requirement as we thought. IMHO if it's a s/old/new
it shouldn't be a huge problem for people who want to maximize performance. (or is it more complicated than that? :P
For ndarray + quant, is it correct understanding that supporting fixed-point scalar number can already solve our problem of floating point atomics?
not modifying computation code might not be as hard requirement as we thought
I agree with that. The main point here is, when we introduce a new language construct, especially as fundamental as a type, we should let it make sense in most cases, instead of being a deployment-only thing.
For ndarray + quant, is it correct understanding that supporting fixed-point scalar number can already solve our problem of floating point atomics?
IMO yes. @k-ye
After an offline discussion with @ailzhang, we reach the following consensus:
fixed32
and fixed64
types, and let users convert float
from/to them:
f_ty = ti.types.fixed32(scale=100.0, signed=False)
arr = ti.ndarray(float, 10)
@ti.kernel def foo(a: ti.types.ndarray()): for i in a: x = ti.cast(a[i], f_ty) ... # calculations on x a[i] = ti.cast(x, float)
foo(arr)
WDYT @k-ye
If this solution looks good, we can finalize this as an individual feature request and then re-consider other aspects of quant APIs with fewer restrictions.
To me it seems like fixed32
is just a small wrapper around the quant API? While I agree that
it can be interpreted by user programs and common third-party frameworks in a trivial way.
it's also not too hard to convert custom quant types into primitive types.
One thing I've been thinking about: if we make quant vectors workable on mobile, how much larger scale can we get for simulation. Note that graphical APIs are already offering f16
vectors, e.g. Metal has half4
, so this is something to consider.
So yeah, I guess we can agree that Ndarray doesn't need to support fancy bit_struct
. But I think it's reasonable to consider quantized scalars and vectors/matrices.
@k-ye Yup that wrapper is mainly used to solve the floating point atomics problem we've seen.
Is it correct understanding that to achieve much larger scale simulation on mobile, we can try adding e.g. half4
as primitive type which applies to both field and ndarray?
Is it correct understanding that to achieve much larger scale simulation on mobile, we can try adding e.g. half4 as primitive type which applies to both field and ndarray?
Yep. Additionally, this could also help with vec4 loading optimization (cc @turbo0628 @qiao-bo )
To me it seems like
fixed32
is just a small wrapper around the quant API?
This indeed requires us to support quantized scalars. Our current APIs cannot be used outside SNodes. However, when a quant type is used as an individual scalar, number of bits other than 32/64 doesn't make sense. As there are already f32/f64
, the only meaningful types to provide are fixed32/64
.
Additionally, this could also help with vec4 loading optimization
I don't quite get the point here. Using native half4/vec4
in codegen instead of current ad-hoc expansion will certainly be an optimization strategy for our ti.types.vector(4, dtype=f16/f32)
. How does it relate to our quantized types?
After yet another discussion with @k-ye @ailzhang @jim19930609, I have formed a mental picture of future plans and would like to share it here.
Although current APIs work only in the SNode system, they are still useful and we hope to expose them in a cleaner way.
Previously, we have two groups of APIs, type_factory
and quant
. The latter is built on top of the former, and is used in the QuanTaichi paper. However, in some real use cases the former is adopted. Having both adds unnecessary burden for users to learn these APIs.
We would like to only keep quant
as it is closer to users, and make it available at ti.types.quant
for consistency with other types. type_factory
will be removed, and its methods will be made private under ti.types.quant
.
To sum up, we will have ti.types.quant.int/fixed/float/_custom_int/_custom_float
. All current usages need to be updated.
bit_struct
SNodeThis corresponds to problem 1 and potential change 2 mentioned above. I plan to add an API ti.bit_struct_wrapper(number_of_bits, list_of_fields, with_shared_exponent)
to solve the inconsistency problem and also make place()
clean. This requires refactoring our SNode system implementation a bit as we are getting rid of the bit_struct
SNode.
For deployment purposes, where performance is valued the most, it is worth providing some new APIs (users have to write things in a new way). The new APIs should work both in the SNode system and for Ndarrays.
Currently, quantized types ti.types.quant.int/float/fixed
can serve as dtype
of fields, with the condition that they are placed as a bit_struct
or bit_array
. We hope to allow direct usage of them as dtype
with no limitations, so that they can also be used in Ndarrays and thus easily deployable. Note that in this case, we need to pad a quantized type to a primitive type with minimum number of bits for storage purposes.
You may wonder what is the use case, considering that no memory can be saved. In fact, the above support is mainly targeting acceleration of atomic operations on mobile phones, by replacing float32
with 32-bit fixed point numbers. Meanwhile, it enables experimenting with different precisions and provides basis for subsequent tasks.
To enable the main advantage, saving memory, of quantized types, we hope to add a quantized vector type ti.types.quant.vector(n, dtype)
, where dtype
must be one of ti.types.quant.int/float/fixed
. The whole type will be padded to a primitive type with minimum number of bits that can hold n
dtype
. This targets common cases like packing two or three components of some physical quantities together.
Similar to Subtask B.2, we can add a quantized struct type ti.types.quant.struct
, which was previously mentioned as ti.types.bit_struct
. This can be an optional task when real need arises.
After this step we can have an official announcement of the rebirth of quantized types!
minor nit: for subtask b.2, I wonder if ti.types.vector(n, dtype)
where ti.types.quant.int/float/fixed
are added to the whitelist of dtype
makes it simpler for users?
Thanks for writing this up! Overall it looks like a great roadmap. I have a few questions here:
Could you provide an overview of the quant API?
Subtask A.1
I plan to add an API ti.bit_struct_wrapper(number_of_bits, list_of_fields, with_shared_exponent) to solve the inconsistency problem
I wonder if with_shared_exponent
is only meaningful for vector/matrix types?
Subtask A.2
type_factory will remain as an internal API at ti.types.quantized_types.type_factory
nit: I feel like we don't have to have both ti.types.quant
and ti.types.quantized_types
. Maybe just ti.types.quant.type_factory
?
minor nit: for subtask b.2, I wonder if ti.types.vector(n, dtype) where ti.types.quant.int/float/fixed are added to the whitelist of dtype makes it simpler for users?
ti.types.vector
and ti.types.quant.vector
are different in many ways. ti.types.quant.vector
is actually stored as a primitive type, has limitations on number of bits, and can accept quant-only configurations like with_shared_exponent
.
I wonder if with_shared_exponent is only meaningful for vector/matrix types?
For struct it can make sense as well..
nit: I feel like we don't have to have both ti.types.quant and ti.types.quantized_types. Maybe just ti.types.quant.type_factory?
ti.types.quant
is the actual API we want to expose. As type_factory
is hidden, we have to visit the whole module path ti.types.quantized_types
for internal usage.
we have to visit the whole module path ti.types.quantized_types for internal usage.
I think there are different ways to handle this: use __all__
to control the public symbols, use quant._type_factory
, etc.
I think there are different ways to handle this: use
__all__
to control the public symbols, usequant._type_factory
, etc.
Ah yes. I was stuck at the assumption that we could not break the two same-level classes, quant
and type_factory
. However it is now a chance to refine things more aggressively.
Now I have a new design: we get rid of the legacy "type_factory" and directly provide the following APIs - ti.types.quant.int/fixed/float/_custom_int/_custom_float
. WDYT @k-ye
BTW which one seems better, quant.int
or quant_int
?
Cool! I prefer quant.int
more, as they can be scoped in the same namespace quant
. WDYT? (cc @ailzhang @jim19930609 )
+1 on quant.int
!
I wonder if with_shared_exponent is only meaningful for vector/matrix types?
For struct it can make sense as well..
I feel like in real use cases shared exponents are typically used only in vectors. Do you have an example where you need that in a struct? :-) @strongoier
Another question of mine: if I'd split the 64 bits into x: fixed21
, y: fixed22
, z: fixed21
, can it be expressed as a quantized vector3? See also the RGB565
format in OpenGL etc.: https://www.khronos.org/opengl/wiki/Image_Format
I feel like in real use cases shared exponents are typically used only in vectors. Do you have an example where you need that in a struct? :-)
Not really. My point here is just that we don't have to throw an error if those fields are not grouped as a vector.
Another question of mine: if I'd split the 64 bits into x: fixed21, y: fixed22, z: fixed21, can it be expressed as a quantized vector3?
In fact we hope that elements of a vector have the same type. A quantized struct is needed for this purpose.
In fact we hope that elements of a vector have the same type. A quantized struct is needed for this purpose.
I see. Thanks for the clarification!
I feel like the user may want to access the components via []
- for example color = (fixed5, fixed6, fixed5)
and the user writes luminance = a[0] + a[1] + a[2]
. Do we plan to support that? :-)
I feel like the user may want to access the components via
[]
- for examplecolor = (fixed5, fixed6, fixed5)
and the user writesluminance = a[0] + a[1] + a[2]
. Do we plan to support that? :-)
Yep. It is fine to support that as syntax sugar.
Yep. It is fine to support that as syntax sugar.
I'm thinking about this: for ti.types.quant.vector(n, dtype)
, can dtype
be a list of quantized types? For example, we may want to allow something like rgb565 = ti.types.quant.vector(3, [fixed5, fixed6, fixed5])
:-) Then it's not simply a syntax sugar, but a real vector type. (Are we worrying about dynamic indexing here?)
I'm thinking about this: for ti.types.quant.vector(n, dtype), can dtype be a list of quantized types? For example, we may want to allow something like rgb565 = ti.types.quant.vector(3, [fixed5, fixed6, fixed5]) :-) Then it's not simply a syntax sugar, but a real vector type. (Are we worrying about dynamic indexing here?)
I understand your point here. TBH this touches some underlying design philosophy of Taichi, which I get a bit confused from time to time.
As far as I understand, in earlier Taichi a vector
is a pure math concept. It promises math operations, but nothing about storage. Because of this, it has great flexibility, allowing components to be non-contiguous, and to have different types. Also because of this, it cannot be directly mapped to native vector types, and cannot support dynamic indexing perfectly.
As time goes by, different voices arise in the community. Many users consider vectors as containers of contiguous same-typed values. As a result, many recent or planned efforts go in this direction - dynamic indexing, native types, etc.
However, these two directions are inherently conflicting - giving more support to one of them means giving less support to the other. To avoid getting design choices back and forth, IMHO we need to have a consistent and clear underlying principle. Then we can easily determine whether a quantized vector can have components with different types.
BTW I have another question: why do we have a struct
type in the presence of a vector
type which can have components with different types?
To avoid getting design choices back and forth, IMHO we need to have a consistent and clear underlying principle.
I also agree on this. We have spent some great amount of time debating on this, and concluded that vector/matrix should behave just like how most users would expect: They are containers holding homogeneous elements, dynamically-indexable, and providing linalg methods. Most of the time, using a Taichi vector/matrix should feel no different from using a GLM/GLSL one. It simplifies the user experience, the API design and the implementation.
If it comes to a point where a non-trivial amount of usage for heterogeneous-vector show up, 1) From a storage point of view, this could supposedly be implemented via quant structs; and 2) we should consider how to offer a proxy/adaptor to help them convert between this quant struct and vectors (in the mathematical sense). WDYT?
However, these two directions are inherently conflicting
Sorry about the confusion. I don't think the two directions are conflicting actually - let me write down a bit more details.
As far as I understand, in earlier Taichi a vector is a pure math concept. It promises math operations, but nothing about storage. Because of this, it has great flexibility, allowing components to be non-contiguous, and to have different types. Also because of this, it cannot be directly mapped to native vector types, and cannot support dynamic indexing perfectly.
I feel like you are mixing global (field) and local vectors. Local vectors are indeed purely math concept, and it says nothing about storage/data layout. In fact, they are always stored on the stack/register file. Local vectors can easily support dynamic indexing.
Global "vectors" are used to specify storage/quantization. For most of the computation, you convert global vectors to local vectors - the conversion involves loading/storing, as well as decoding/encoding for quantized types.
Then we can easily determine whether a quantized vector can have components with different types.
Perhaps the point is the components can have different (quantized) storage types, but they must share the same compute type? This ensures when loading them you get a formal float32x3
/float64x3
etc.
BTW I have another question: why do we have a struct type in the presence of a vector type which can have components with different types?
You still need struct
since you may have quantized int and quantized float in the same quant struct :-) It's more about "compute_type" in the QuanTaichi paper.
I also agree on this. We have spent some great amount of time debating on this, and concluded that vector/matrix should behave just like how most users would expect: They are containers holding homogeneous elements, dynamically-indexable, and providing linalg methods. Most of the time, using a Taichi vector/matrix should feel no different from using a GLM/GLSL one. It simplifies the user experience, the API design and the implementation.
I totally agree with this. In fact, in the future, we should simply reuse the GLSL vector/matrix operators in the codegen :-)
The rgb565
type should be decoded into a vec3
(float32x3
, if the compute_type
for the three quantized components is float32
) for computation, similar to the imageLoad
function in GLSL.
If it comes to a point where a non-trivial amount of usage for heterogeneous-vector show up, 1) From a storage point of view, this could supposedly be implemented via quant structs; and 2) we should consider how to offer a proxy/adaptor to help them convert between this quant struct and vectors (in the mathematical sense). WDYT?
Just to clarify: I don't think we should support "heterogeneous-vector" that contains both float and int, or float32 and float64. That's against most users' common practice and is against our recent attempt to support dynamic indexing and native types. I can't come up with a typical use case where you need a vector composed with both int and float.
But I do feel like we should allow different components of a homogeneous vector to be stored as different quantized types, since such usage is common in graphics (e.g., the RGB565 format). The price you have to pay though, is when you load/store such vectors, you always have to load/store them as a whole, instead of loading a single component. I believe paying such price is no big deal in practice :-)
(Sorry about joining this discussion late. Most of the thread makes a lot of sense to me. The only thing that I hold a different opinion is that "we should consider allowing different components of a homogeneous vector to be stored as different quantized types")
The rgb565 type should be decoded into a vec3
Yep. To support my argument, rgb565
is-a quantized struct, rather than a quantized vector. And +1 that it will be decoded/converted to a regular vec3
.
I believe what @strongoier meant in "these two directions are inherently conflicting... we need to have a consistent and clear underlying principle." is also this point... It is conflicting in the sense that rgb565
itself is only a storage type, and shouldn't be used for computing directly. Before participating any kind of computation, it will need to first go through this decoding stage into a mathematically-legit vector. I think this principle is where we don't have a consensus yet, i.e., vector
s should be treated in the purely mathematical way, and should not take much responsibility in fancy storage patterns. To make the quantized type work like a vector, Taichi or the users will need to convert them first.
It is conflicting in the sense that rgb565 itself is only a storage type, and shouldn't be used for computing directly.
Exactly. (The only exception is when you want to perform rgb565 + rgb565
using u16
operator +. I assume that is a rare use case.)
You need to either associate a compute_type
(e.g., ti.types.vec3
) with rgb565
, or explicitly let the user do rgb565_array[I, j, k].decode(ti.types.vec3)
.
I think this principle is where we don't have a consensus yet, i.e., vectors should be treated in the purely mathematical way, and should not take much responsibility in fancy storage patterns. To make the quantized type work like a vector, Taichi or the users will need to convert them first.
I agree with this. Perhaps ti.types.vector
is for both computation & storage (since you need AOS/SOA/...), and ti.types.quant.vector
is only for (AOS) storage? I can't easily come up with a use case where ti.types.quant.vector
needs SOA so I assume it's AOS only.
Yep. To support my argument, rgb565 is-a quantized struct, rather than a quantized vector. And +1 that it will be decoded/converted to a regular vec3.
What confuses me here: if this holds true, isn't ti.types.quant.vector
a special case of ti.types.quant.struct
? And it sounds like we will need two code paths for ti.types.quant.struct
with homogenous and inhomogeneous components, the former automatically/optionally converted into a vector but the later constantly stays a struct.
Let me do a quick summary (which contains some personal ideas, though).
ti.types.quant.vector
should be loaded into a local vector before doing any calculation. The result should also be stored as a whole.ti.types.quant.vector
can accept its components to have different quant types (e.g. dtype=[fixed5, fixed6, fixed5]
), but they should have the same compute type. This will be checked upon type definition.ti.types.quant.vector
can also take an optional physical_type
if users don't want the automatically inferred one. This is mainly useful for ti.types.quant.matrix
, which may not fit into one primitive type. In this case, users may want to manually specify if they want 32 bits or 64 bits as a storage unit.This is mainly useful for ti.types.quant.matrix, which may not fit into one primitive type. In this case, users may want to manually specify if they want 32 bits or 64 bits as a storage unit.
I wonder if physical_type
has to fit in 64 bits?
I wonder if
physical_type
has to fit in 64 bits?
I'm not sure if physical_type
is a good name. It simply allows users to choose between 8 / 16 / 32 / 64 bits. For ti.types.quant.matrix(3, 3, [[f11, f11, f10], [f11, f11, f10], [f11, f11, f10]])
, users can choose to use 2 units of 64 bits or 3 units of 32 bits to store it.
There are a few more things to decide about the public APIs of quantized type definitions.
ti.types.quant.int
, the default value of signed
is False
. However, for ti.types.quant.fixed/float
, that value is True
. The inconsistency may confuse users. I suggest making all of them True
, considering that in Python numerical types are signed by default.ti.types.quant._custom_float
, aiming for directly appointing scale
, instead of calculating scale = range / 2**frac
with the public API ti.types.quant.fixed
. I guess the intention is to avoid floating point errors on the division. Shall we also add a scale
parameter to ti.types.quant.fixed
? Then users no longer have to use the internal API for the mentioned purpose.WDYT @yuanming-hu @k-ye @ailzhang
Currently, for ti.types.quant.int, the default value of signed is False. However, for ti.types.quant.fixed/float, that value is True. The inconsistency may confuse users. I suggest making all of them True, considering that in Python numerical types are signed by default.
Sounds good to me!
There are some usages of the internal API ti.types.quant._custom_float, aiming for directly appointing scale, instead of calculating scale = range / 2**frac with the public API ti.types.quant.fixed. I guess the intention is to avoid floating point errors on the division. Shall we also add a scale parameter to ti.types.quant.fixed? Then users no longer have to use the internal API for the mentioned purpose.
Could you point me to a specific use case of using scale
for quantized floating-point? That's probably not to avoid float point errors. The intention could be to shift the dynamic range. E.g., from 1e-5~1e5
to 1e-3~1e7
.
Could you point me to a specific use case of using scale for quantized floating-point?
Well I didn't find it in quantaichi or taichi_elements repos. These repos only contain usage like https://github.com/taichi-dev/quantaichi/blob/809f7c6e6cc3c9d446a184e94f1ff733d5bcd7b4/mls_mpm/benchmark/quan_mpm_benchmark.py#L235-L237, which can be clearly replaced with ti.types.quant.fixed
.
However, there are some interesting use cases in tests, e.g., https://github.com/taichi-dev/taichi/blob/a0a805972d1b73f28cebcf45591e6514c8b5989e/tests/python/test_custom_type_atomics.py#L69-L91
I'm not sure if this is only for testing purposes :-)
I see. For fixed points, it's fine to leave both range
and scale
in the API :-)
(The original _custom_float
API is misleading. It's actually a fixed-point number)
While fixing #5009, I found a tricky use case that we previously missed - a BitStruct
SNode can actually be read out as a whole, with its physical type:
https://github.com/taichi-dev/taichi/blob/fba92cf76f93668033678e91eb219ba9c9f4a1ef/misc/visualize_quant_types.py#L48-L50
With the new proposal, where bit_struct_wrapper
is used to group quant fields together, such use case may not be allowed. TBH I feel like such use case is mainly for debugging the internal implementation, and it should be OK to disallow it. WDYT @yuanming-hu @k-ye @ailzhang
We can simply disallow it :-)
After yet another discussion with @Hanke98 @k-ye @ailzhang, I would like to share some updates.
In Taichi v1.1, we hope to first release a refined version of quantized types used with SNodes, in order to get this feature officially announced and tried by users. The TODO list of this whole issue is thus shuffled by priority, and I'll track v1.1 blockers here:
bit_struct
SNode refinement.bit_array
SNode refinement.We only present three basic quantized types, ti.types.quant.int/fixed/float
, to users.
bit_struct
SNode refinement planLet me illustrate the API change with the following example. It is indeed an altered version of ti.bit_struct_wrapper()
proposed in https://github.com/taichi-dev/taichi/issues/4857#issuecomment-1122134432, with a more natural way to express shared exponents.
Common part:
u4 = ti.types.quant.int(bits=4, signed=False)
f15 = ti.types.quant.float(exp=5, frac=10)
f18 = ti.types.quant.float(exp=5, frac=13)
p = ti.field(dtype=u4)
q = ti.field(dtype=f15)
r = ti.field(dtype=f18)
Old API:
blk = ti.root.dense(ti.i, 4).bit_struct(num_bits=32)
blk.place(p)
blk.place(q, r, shared_exponent=True)
Previous proposal:
ti.root.dense(ti.i, 4).place(ti.bit_struct_wrapper(32, [p, [q, r]]))
New API:
bitpack = ti.BitpackedFields(max_num_bits=32)
bitpack.place(p)
bitpack.place(q, r, shared_exponent=True)
ti.root.dense(ti.i, 4).place(bitpack)
bit_array
SNode refinement planbit_array
will remain a SNode, but will be renamed to quant_array
, considering that bit_array
is usually used to refer to 0/1 arrays.bit_vectorize
is currently a ad-hoc configuration function specifying how many bits are vectorized together, which leads to confusing API explanations like bit_vectorize(1)
means off while bit_vectorize(32)
means on. As the vectorization unit is always the physical type, bit_vectorize
will be turned into an on/off switch inside a loop config.bit_vectorize
only applies to 0/1 arrays, which should be turned off by default. Struct fors on quant_array
s with bit_vectorize
off should work properly.ti.types.quant.fixed
should be supported as elements in quant_array
s.We need to write a tutorial about using quantized types, based on what we already have in taichi_elements and quantaichi. These repos should also be updated with latest APIs.
Just one comment, I wonder if we can put BitpackedFields
under ti.quant
as well :-)
Just one comment, I wonder if we can put BitpackedFields under ti.quant as well :-)
Unfortunately, we only have a ti.types.quant
module for type definitions, which may not be suitable for BitpackedFields
...
Let me share some updates here. All planned refinement in https://github.com/taichi-dev/taichi/issues/4857#issuecomment-1173647894 has been realized and announced to users in Taichi v1.1. Furthermore, current codegen regarding quantized types is independent of SNode now, which allows flexible extensions in the future. A potential future direction is to allow quantized types in Ndarrays (similar to https://github.com/taichi-dev/taichi/issues/4857#issuecomment-1122134432 task B), which will get implemented when more real requirements arise.
Quantized types are an experimental feature introduced in the QuanTaichi paper. With this useful feature, users can significantly save memory usage of their Taichi programs. The feature can also enable acceleration of atomic operations on mobile phones.
However, the feature has been neither officially announced nor extensively maintained. As Taichi has come to its 1.0 version, I think it is time to polish the feature and make it available to users. My plan is to refine the API and implementation so that it can fit into current Taichi better, be more user-friendly, and become deployable with Taichi AOT. I would like to write an RFC for it.