Closed kitchoi closed 8 years ago
then we need to find another more streamlines way of specifying this information, if possible without introducing more keywords. for example, can we let yaml treat [:10] as a text string?
if this is not possible, how does the following look like:
shape:
dimension_1:
min: 3
max: 3
dimension_2:
min: 3
max: 3
which would mean 2 dimensions, the first must have 3 items, and the second also 3 items exactly.
shape:
dimension_1:
min: 2
max:
one dimenstional "list" with at least to items, perhaps other existign CUBA can be used instead of CUBA.DIMENSION_3, CUBA.DIMENSION_1, and CUBA.DIMENSION_3 (assuming by convention, and attribute in the metadata is itself a cuba, the ones with lower case are different from the ones with CUBA just because they are specified on the metadata level. etc.
would this be easier?
How about this:
# yaml
shape: [-1] # 1D, any shape
shape: [[2, 10]] # 1D, length >=2 and length <=10
shape: [[-1, 10], [1, -1]] # 2D, first dimension has max size 10, second dimension has min size 1
shape: [-1, -1] # -1 is just a space holder, in this case it is a 2D array with arbitrary shape
shape: [3, 3] # 2D array of a shape 3x3
this would be nice too:
# yaml
shape: [] # 1D, any shape
No, it should be
# yaml
shape: [] # any dimension and any shape
any one will do the job, and is general enough, and while we are at it, and changing the metadata schmea of cuba.yml, why not changing shape to dimension, which is more logical?
any one will do the job, and is general enough, and while we are at it, and changing the metadata schmea of cuba.yml, why not changing shape to dimension, which is more logical?
Yes, dimension is the term that should be used when describing "shape" of a vector. I also think that we should try to keep it as a one-liner picking one of what @kitchoi proposed. I can add this issue to #7.
I think the naming of "dimension" makes sense for numerical data such as vectors and tensors, but the meaning of "dimension" becomes less obvious when one is dealing with e.g. CUBA.MATERIAL (or CUBA.MATERIAL_LIST) which is a sequence of objects
CUBA.MATERIAL
I don't think CUBA.MATERIAL has a dimension/shape defined anymore. In simphony_metadata.yml Material is derived from CUDS_COMPONENT and so on.
But, of course, there might be other counter examples that I'm missing, where you're making a good point, @kitchoi.
@tuopuu you are right that MATERIAL is to be derived from CUDS_COMPONENT So this (and similar lines) should be modified: https://github.com/simphony/simphony-metadata/blob/master/simphony_metadata.yml#L64
So this (and similar lines) should be modified
I wonder if we should have a dimension for vectors and matrices, and a shape for lists, like in the example you linked. Dimension
and shape
could be used as synonyms in the Python class generation script, as they both are describing a size of an array under the hood. So, for the user it would be just left to decide which one to use in the yaml files, with a preferation of using dimension for mathematical objects and a shape for list-like objects. Comments?
Because I use numpy a lot and numpy uses shape
, I am used to understanding it as "dimensions" of multidimensional arrays (with the emphasis on plural form of "dimensions"). Therefore if you ask me, I would be happy to keep shape
for both arrays and lists. However, if we are to distinguish the two, then I would use shape
/dimensions
for arrays, and length
for lists. (e.g. length: 3
(not a list of integer any more)).
Note that I am biased because of numpy.
I'm sorry that I should take back length
, since you probably want to set the minimum and maximum size of the list
I am ok with the numpy way, makes sense to use the same nomenclature.
So this (and similar lines) should be modified: https://github.com/simphony/simphony-metadata/blob/master/simphony_metadata.yml#L64
actually, here material is meant to have a shape. Being on the attributes side (the CUBA.MATERIAL) does not mean that the value is necessary a material, but can be a list of uuids for example, this is implementation dependent! which is not very good from the metadata point of view. The reason for initially proposing it this way is to avoid creating new keywords for lists. However, defining Materials as a Material with a shape of [1:] or using a new list type would be better:
MATERIALS:
parent: CUBA.LIST # would be nice to have list, tensor, vector, matrix, etc. with shape following numpy nomenclature.
shape: [2,]
or
MATERIALS:
parent: CUBA.MATERIAL
shape: [2,]
and then:
MATERIAL_RELATION:
definition: Material Relation
parent: CUBA.MODEL_EQUATION
CUBA.MATERIALS:
scope: CUBA.USER
Can we please change the syntax to https://github.com/simphony/simphony-metadata/issues/9#issuecomment-196817460 and keep shape for now, then this issue is done!!
@ahashibon, I'll make the syntax change and leave the shape name as is.
I think the problem here arises from the yaml parser. If we use parentheses instead of square brackets, we can happily do
shape: (1: , 2:)
shape: (:2, :)
shape: ()
shape: (:)
and won't break yaml as yaml simply won't try to decode them and will keep them as str
.
sounds good, i will do it tomorrow.
merged branch with round brackets and closed this issue.
yaml can't handle these:
For
shape: [1:10]
you need a space after colon:will be parsed as