simphony / simphony-metadata

[LEGACY] This repository contains the metadata definitions used in SimPhoNy project.
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Some shape syntax do not work with yaml #9

Closed kitchoi closed 8 years ago

kitchoi commented 8 years ago

yaml can't handle these:

# yaml
shape: [:10]
# or
shape: [: 10]
# or
shape: [1:10]

For shape: [1:10] you need a space after colon:

# yaml
shape: [1: 10, 2: ]

will be parsed as

# python:
[{1: 10}, {2: None}]
ahashibon commented 8 years ago

then we need to find another more streamlines way of specifying this information, if possible without introducing more keywords. for example, can we let yaml treat [:10] as a text string?

if this is not possible, how does the following look like:

 shape:
   dimension_1: 
       min: 3
       max: 3 
   dimension_2:
      min: 3
      max: 3

which would mean 2 dimensions, the first must have 3 items, and the second also 3 items exactly.


 shape:
   dimension_1: 
       min: 2
       max:  

one dimenstional "list" with at least to items, perhaps other existign CUBA can be used instead of CUBA.DIMENSION_3, CUBA.DIMENSION_1, and CUBA.DIMENSION_3 (assuming by convention, and attribute in the metadata is itself a cuba, the ones with lower case are different from the ones with CUBA just because they are specified on the metadata level. etc.

would this be easier?

kitchoi commented 8 years ago

How about this:

# yaml
shape: [-1]  # 1D, any shape
shape: [[2, 10]]  # 1D, length >=2 and length <=10
shape: [[-1, 10], [1, -1]]  # 2D, first dimension has max size 10, second dimension has min size 1
shape: [-1, -1]  # -1 is just a space holder, in this case it is a 2D array with arbitrary shape
shape: [3, 3] # 2D array of a shape 3x3
kitchoi commented 8 years ago

this would be nice too:

# yaml
shape: [] # 1D, any shape
kitchoi commented 8 years ago

No, it should be

# yaml
shape: [] # any dimension and any shape
ahashibon commented 8 years ago

any one will do the job, and is general enough, and while we are at it, and changing the metadata schmea of cuba.yml, why not changing shape to dimension, which is more logical?

tuopuu commented 8 years ago

any one will do the job, and is general enough, and while we are at it, and changing the metadata schmea of cuba.yml, why not changing shape to dimension, which is more logical?

Yes, dimension is the term that should be used when describing "shape" of a vector. I also think that we should try to keep it as a one-liner picking one of what @kitchoi proposed. I can add this issue to #7.

kitchoi commented 8 years ago

I think the naming of "dimension" makes sense for numerical data such as vectors and tensors, but the meaning of "dimension" becomes less obvious when one is dealing with e.g. CUBA.MATERIAL (or CUBA.MATERIAL_LIST) which is a sequence of objects

tuopuu commented 8 years ago

CUBA.MATERIAL

I don't think CUBA.MATERIAL has a dimension/shape defined anymore. In simphony_metadata.yml Material is derived from CUDS_COMPONENT and so on.

tuopuu commented 8 years ago

But, of course, there might be other counter examples that I'm missing, where you're making a good point, @kitchoi.

kitchoi commented 8 years ago

@tuopuu you are right that MATERIAL is to be derived from CUDS_COMPONENT So this (and similar lines) should be modified: https://github.com/simphony/simphony-metadata/blob/master/simphony_metadata.yml#L64

tuopuu commented 8 years ago

So this (and similar lines) should be modified

I wonder if we should have a dimension for vectors and matrices, and a shape for lists, like in the example you linked. Dimension and shape could be used as synonyms in the Python class generation script, as they both are describing a size of an array under the hood. So, for the user it would be just left to decide which one to use in the yaml files, with a preferation of using dimension for mathematical objects and a shape for list-like objects. Comments?

kitchoi commented 8 years ago

Because I use numpy a lot and numpy uses shape, I am used to understanding it as "dimensions" of multidimensional arrays (with the emphasis on plural form of "dimensions"). Therefore if you ask me, I would be happy to keep shape for both arrays and lists. However, if we are to distinguish the two, then I would use shape/dimensions for arrays, and length for lists. (e.g. length: 3 (not a list of integer any more)). Note that I am biased because of numpy.

kitchoi commented 8 years ago

I'm sorry that I should take back length, since you probably want to set the minimum and maximum size of the list

ahashibon commented 8 years ago

I am ok with the numpy way, makes sense to use the same nomenclature.

ahashibon commented 8 years ago

So this (and similar lines) should be modified: https://github.com/simphony/simphony-metadata/blob/master/simphony_metadata.yml#L64

actually, here material is meant to have a shape. Being on the attributes side (the CUBA.MATERIAL) does not mean that the value is necessary a material, but can be a list of uuids for example, this is implementation dependent! which is not very good from the metadata point of view. The reason for initially proposing it this way is to avoid creating new keywords for lists. However, defining Materials as a Material with a shape of [1:] or using a new list type would be better:

MATERIALS: 
  parent: CUBA.LIST   # would be nice to have list, tensor, vector, matrix, etc. with shape following numpy nomenclature. 
  shape: [2,]

or

MATERIALS: 
  parent: CUBA.MATERIAL
  shape: [2,]

and then:

MATERIAL_RELATION:
    definition:  Material Relation
    parent: CUBA.MODEL_EQUATION
    CUBA.MATERIALS:
      scope: CUBA.USER
ahashibon commented 8 years ago

Can we please change the syntax to https://github.com/simphony/simphony-metadata/issues/9#issuecomment-196817460 and keep shape for now, then this issue is done!!

tuopuu commented 8 years ago

@ahashibon, I'll make the syntax change and leave the shape name as is.

kitchoi commented 8 years ago

I think the problem here arises from the yaml parser. If we use parentheses instead of square brackets, we can happily do

shape: (1: , 2:)
shape: (:2, :)
shape: ()
shape: (:)

and won't break yaml as yaml simply won't try to decode them and will keep them as str.

ahashibon commented 8 years ago

sounds good, i will do it tomorrow.

ahashibon commented 8 years ago

merged branch with round brackets and closed this issue.