Open rainwoodman opened 8 years ago
About earlier on, here's a couple of ideas I had based on our conversation. It is kind of very sparse.
Array<T>
with GType
Array<T?>
Slice<T>
)ArrayIterator<Array<T>>
Array.foreach
Array.fill
, Array.zero
, Array.eye
(see class
visibility)
Array<T>
Array<S>
to Array<T>
)Use both delegate and interface. The first for day-to-day operations and the latter for structuring computation tree.
// morphism
public delegate Array<T> MorphismFunc<S, T> (S @in);
public abstract class Morphism<S, T> : Object {
public abstract Array<T> apply (Array<S> @in);
public abstract Morphism<S, T> gradiant (Array<S> around);
}
// special case, automorphisms
public abstract class Automorphism<S> : Morphism<S, S> {
}
// unefficient, but so useful helper
var vectorized_sin = Vast.vectorize<double, double> (Math.sin);
Array.fill<double> (10, 10, Math.PI / 2)
.map<double> (Vast.vectorize<double, double> (Math.sin))
.slice (0, 1);
var sin_array = array.map<double> (x => sin (x));
var sum = array.reduce<double> ((x, c) => x + c, 0);
public class Array<T> {
public abstract Array<S> map (Morphism<T, S> morphism, uint dimension = 0);
}
Reuse parent memory in situation of single-ownership.
arr.map<double> (Vast.Math.sin) // copy 1
.map<double> (Vast.Math.sin) // copy 2, but into copy 1 memory
.map<double> (Vast.Math.sin); // copy 3, but into copy 2 memory (which is copy 1)
I'll probably edit this post later with more details.
Oh, yeah, sparse arrays would be helpful. If we make Array<T>
abstract with various built-in implementations and a ArrayModule
to provide custom ones, that would be just neat.
With generics (which is essentially carrying a GType around), I don't see where GValue fits. Maybe in functions, but even there we're better using generics as shown in above examples.
You could look into GLib.TypeQuery which would basically allow us to allocate instances in contiguous memory (e.g. know size of record type beforehand).
For record type, a struct declaration suffice as it will be registered with GType. We just need to store it by value to avoid expensive pointer indirections.
The way I see things is that vectors are single-dimension arrays. It could as well be boxed in a scalar, but at this point we cannot provide efficient storage. Basically, anything that is not registered with GType will be boxed and will incur indirections.
In the future, we should register large precision types (e.g. float128
, ...) as well as basic type transformation. I don't think GLib will do that for portability reason, but we perfectly can!
I thought that Gee could be interesting here, but then I realized that we need dimension-wise operands, which is far from there. However, it would still be a nice thing to look at to design the API.
A lot of things to comment here.
map
. This is not the best way to comment on the code .. but I hope you see my concerns. I am very afraid of <>
and over abstraction.
map
and reduce
are members of a 'operation' -- they are defined differently for uniry and binary operators/functions.cast
is special map because it may happen automaticly and changes the scalartype.For dealing with GObject as scalars, it's going to be a bit harder because they typically deal with their own allocation (slice, malloc, etc...).
We might want to just have a generic argument on Array<T>
since the type is uniform on all elements and let the transformations deal with GValues. However, we would have to wrap/unwrap when applying transformations.
Basically, we only need unary and binary morphisms (and possible vectorized ones). I think it should work on scalars as you proposed, but we should still have a way of transforming any existing unary/binary function into a Vast-friendly function. Maybe the term morphism is a bit too mathematical, but it's really the most general class of transformations.
I think it's better to work on scalar in the end because we want to let the array implementation decide for the destination.
The following definitions should be fine:
public interface Morphism : Object {}
public interface UnaryMorphism : Morphism {
public abstract Value apply (Value a);
public abstract UnaryMorphism gradiant (Value around);
}
public interface BinaryMorphism : Morphism {
public abstract Value apply (Value a, Value b);
public abstract BinaryMorphism gradiant (Value around_a, Value around_b);
}
The whole thing is really about whether to store the type along with the scalar or carry it around. For Array
, it's definately better to have a single Type
for all elements, but for transformations we need some sort of friendly baked-in conversion. However, this will induce wrapping and unwrapping from and to GValue
whenever we will apply a transformation.
Sure, to start, let's just focus on a fully-working native Array
implementation. I just say that in the future, it will be possible to have static factories (e.g. Array.fill
) that are aware of the concrete class (e.g. NArray.fill
).
I tried to code up Morphism but realized a problem. A generic morphism function has nin and nout variables. But if we do it via vala generics then we are demanded to use a predetermined set of nin and nout.
Another issues is GTypeQuery returns 0 instance_size for non-object types.
All structs are boxed types.
We can either bookkeep common ones or ask for a size at construct-time because I doubt there's a reliable way to determine that via GType
.
For morphisms, we can have a single definition that work with GValue[]
and provide input and output dimensions.
I'm writing a VAPI to provide additional numerical types with GCC extensions. It will cover half-precision and quad-precision floats, decimals, complex and 128 bit integers as well as their related math functions. I'll add it to https://github.com/nemequ/vala-extra-vapis once tested.
Actually, I'll write a small package around the binding to register boxed types and transformations for these types so that we can use it more conveniently. The idea is that one just link against libnumeric-glib
to be able to use it in Vast.
The nicest thing we could do about boxed types in general is to keep an index of unboxed sizes and maybe some extra metadata to stringify the type:
Vast.register_type_size (typeof (float128), 16, float128.FORMAT);
assert (Vast.unboxed_sizeof (typeof (float128)) == 16); //or '-1' if unknown and the type remains boxed
Then simply unwrapping types for which we know the size and keep boxed others.
We should work as much as possible with GLib.Value
, but still keep the Array<T>
generic for uniformity. For example, the setter in Array
could perform transformation and copy the memory of the boxed type:
public set (Index index, owned Value src_value) {
var dest_value = new Value (typeof (T));
Value.transform (src_value, ref dest_value);
Memory.copy (_data + index.offset, dest_value.get_boxed (), unboxed_sizeof (typeof (T)));
}
There's more hope via GI to obtain details (size, fields, ...) about structs: http://www.valadate.org:8000/#!api=gobject-introspection-1.0/GI.StructInfo
Thus, we can obtain the size for structures that are properly registered and fallback on boxed types for the rest. We'll see for printing stuff later I guess. In short, let's not split type information and rely more on GObject Introspection for that.
There will be serious overhead if the transformation are not type-compatible. Maybe the morphism could be prepared (e.g. told in advance the type) and then applied directly on the raw data.
For example, a sin
implementation would probably want to use specific optimized versions if it has to work on float128
. This could be a good reason to use generics.
public delegate T PrecomputedMorphismFunc<S, T> (S);
public delegate U PrecomputedBinaryMorphismFunc<S, T, U> (S, T);
interface Morphism {
public abstract Value apply (Value @in);
public abstract PrecomputedMorphismFunc<S, T> prepare<S, T> ();
}
public class Sin {
public override PrecomputedMorphismFunc<S, T> () {
if (typeof (S) == typeof (double)) {
return Math.sin;
}
if (typeof (S) == typeof (float128)) {
return Math.sinq;
}
}
}
All this would represent a big load of code. There should be better approach to make this more declarative.
`Thus, we can obtain the size for structures that are properly registered and fallback on boxed types for the rest. We'll see for printing stuff later I guess. In short, let's not split type information and rely more on GObject Introspection for that.
I think we shall insist on managing the memory of the array members. (that's what 'dense' means isn't it?) So we always need the instance_size
. I wonder if GI knows about the instance size of fundamental types ( I suspect not).
If not we still need to have our own type dictionary -- what we'll do is if something is not registered with our dictionary, then we look it up via GI. If still not found, we do an assertion and die?
I am concerned about the location of the if statement in your morphism example.
'PrecompuatedMorphismFunction' implements a uniary element-wise logic that is going to be invoked in an array iteration loop. The if
condition can grow quite big if there are many type signatures. The compiler may be unable to lift the if outside of the loop.
That was why I was thinking of having a delegate that can be looked up before the iteration starts. The compiler may be able to inline the function call.
The concern over varying the number of arguments to a Morphism still remains if Generics are used. MorphismBinary<S1, S2, T>
?
big load of code
, after a few examples are done, we can always write a quick code generator with python (e.g. some template system for webframeworks?).
numpy uses a customized template for this sort of things, too.
Yes! Meson will support generators for Vala coming from 0.36! We can even write generators in Vala using libvala :)
We might need another real-time conversation to wrap all these ideas up.
Just finished the basic definitions for arteymix/numeric-glib. I will probably write macros to perform all the type casting for Value.register_transform_func
when I'll have an hour to kill ;)
It's not meant to be a dependency, but we could use it for some examples.
Here's a transcript:
Ask for the type at construct-type. Move the responsibility of allocating dense arrays to the caller as it consistently know what the type size is.
Fallback on boxed types if the size is unknown (e.g. represented by -1
?).
Operate on GValue
to manipulate scalars? Or simple generics?
Declare functions taking arbitrary Array
as input and output. Use ref
to
force the caller to pass pre-allocated Array
.
/**
* Compute element-wise sinus of 'x' to 'z'.
*
* @partial_derivative: cos
*/
public void sin (Array x, ref Array z);
Use a YAML file with the following details to generate the function code in Vala:
sin
or sinq
for float128
Use libvala or some sort of templating language to generate the functions code with Meson generator.
Use GObject introspection features to obtain meta-data (arity, shape constraints, ...) at runtime. This will be used to generate computation grpahs.
Use comments and a specific language to express partial derivatives that can be extracter after. Possibly using YAML.
Reuse Array
when evaluating the function graph instead of copying or
buffering. Focus parallelism in a pre-array fashion.
A computation graph could simply be a graph of GI.FunctionInfo
objects with
all the GI.BaseInfo
.
In order, we should focus on the following:
Array
spec (deal with scalar size properly, get/set scalars)
Points taken from phone call on Oct 28th: