mathworks-ref-arch / matlab-avro

MATLAB interface for Apache Avro files.
Other
9 stars 2 forks source link

Serializing object with empty fields #3

Open line-solver opened 4 years ago

line-solver commented 4 years ago

Hi.

Great project. I am using MATLAB 2020a. I am trying to serialize the object in the attached zip.

The code seems to have a problem in serializing empty cell arrays:

mySchema = matlabavro.Schema.createSchemaForData(qn) Index in position 1 exceeds array bounds. Error in matlabavro.getSchemaStructure>get2DArrayStructure (line 34) tmpSt.items = matlabavro.getAvroType(class(data(1,1))); Error in matlabavro.getSchemaStructure>getFields (line 133) fields(pCount).type = get2DArrayStructure(data.(props{pCount})); Error in matlabavro.getSchemaStructure>parseObjectProperties (line 45) fields = getFields(data,props); Error in matlabavro.getSchemaStructure (line 9) schemaStructure = parseObjectProperties(data); Error in matlabavro.Schema.createSchemaForData (line 249) schemaStructure = matlabavro.getSchemaStructure(data);

If then replace .lst, .space and .rtfun with NaNs, I still get an error:

mySchema = matlabavro.Schema.createSchemaForData(qn) Error using containers.Map/subsref The specified key is not present in this container. Error in matlabavro.getAvroType (line 51) avroStr = tMap(inputStr); Error in matlabavro.getSchemaStructure>get2DArrayStructure (line 34) tmpSt.items = matlabavro.getAvroType(class(data(1,1))); Error in matlabavro.getSchemaStructure>getFields (line 133) fields(pCount).type = get2DArrayStructure(data.(props{pCount})); Error in matlabavro.getSchemaStructure>parseObjectProperties (line 45) fields = getFields(data,props); Error in matlabavro.getSchemaStructure (line 9) schemaStructure = parseObjectProperties(data); Error in matlabavro.Schema.createSchemaForData (line 249) schemaStructure = matlabavro.getSchemaStructure(data);

qnstruct.zip

Thanks Giuliano

vveerapp commented 4 years ago

Giuliano, Thank you for submitting an issue. We are looking into this.

line-solver commented 4 years ago

@vveerapp I realized you may not be able to reproduce the issue without the class hierarchy. If you clone this repo and do addpath(genpath(pwd)) in the root you can load that mat. https://github.com/line-solver/line

vveerapp commented 4 years ago

@line-solver the automatic schema generation is available to make it easy to generate a schema for MATLAB structures. For empty cell arrays or simple cells, it should be easy to generate the schema as below:

qn = {};
%mySchema = matlabavro.Schema.createSchemaForData(qn)
mySchema = matlabavro.Schema.parse('{"type":"array","items":"double"}')

Replace the items with your expected data type, above line expects it to be a double.

The automatic schema generation could be extended to handle all MATLAB data types, including empty cells, however, for empty cells, there is no way of knowing what the expected data type needs to be. In such cases, it is better to use the matlabavro.Schema.parse method and specify the data type in items as above. Let me know if this works for your use case. Thanks.

vveerapp commented 4 years ago

@line-solver I looked into this further with your actual test data and was able to read and write to avro without issues. Can you confirm if you are using version 0.8.2?

Below is my sample code

qn = [3707764736;2;1;1;1;2];
mySchema = matlabavro.Schema.createSchemaForData(qn);
avroreadwrite(qn, mySchema);

qn = {}
mySchema = matlabavro.Schema.create(matlabavro.SchemaType.NULL);
avroreadwrite(qn, mySchema);

function avroreadwrite(qn,mySchema)

myWriter = matlabavro.DataFileWriter();
myWriter = myWriter.createAvroFile(mySchema,'myFile.avro');
myWriter.append(qn);
myWriter.close();
myReader = matlabavro.DataFileReader('myFile.avro');
qnfromFile = myReader.next()
end
line-solver commented 4 years ago

@vveerapp Thanks. Yes the example you pasted works. Version: When I run system('mvn dependency:copy') it downloads avro-1.9.2.jar

However my initial data structure I attached still doesn't work: load qnstruct.mat; mySchema = matlabavro.Schema.createSchemaForData(qn); gets Index in position 1 exceeds array bounds. Error in matlabavro.getSchemaStructure>get2DArrayStructure (line 34) tmpSt.items = matlabavro.getAvroType(class(data(1,1))); Error in matlabavro.getSchemaStructure>getFields (line 133) fields(pCount).type = get2DArrayStructure(data.(props{pCount})); Error in matlabavro.getSchemaStructure>parseObjectProperties (line 45) fields = getFields(data,props); Error in matlabavro.getSchemaStructure (line 9) schemaStructure = parseObjectProperties(data); Error in matlabavro.Schema.createSchemaForData (line 249) schemaStructure = matlabavro.getSchemaStructure(data);

This data structure is a matlab object of a class with the following properties which are basically just arrays and cells, sometimes nested, so I don't see what it doesn't work. All the contents of these arrays and cells are numerical data.

Could there be a problem with passing a matlab object as an argument instead than a struct? qn = NetworkStruct with properties:

              cap: [2×1 double]
           chains: [1 1]
         classcap: [2×2 double]
       classnames: {2×1 cell}
        classprio: [0 0]
           csmask: [2×2 logical]
       isstatedep: [4×3 logical]
        isstation: [4×1 logical]
       isstateful: [4×1 logical]
            isslc: 0
              lst: {}
               mu: {2×2 cell}
          nchains: 1
         nclasses: 2
      nclosedjobs: 4
            njobs: [2 2]
           nnodes: 4
         nservers: [2×1 double]
        nstations: 2
        nstateful: 2
            nvars: [4×1 double]
        nodenames: {4×1 cell}
       nodevisits: {[4×2 double]}
         nodetype: [4×1 double]
           phases: [2×2 double]
         phasessz: [2×2 double]
       phaseshift: [2×3 double]
              pie: {2×2 cell}
              phi: {2×2 cell}
             proc: {2×2 cell}
            rates: [2×2 double]
          refstat: [2×1 double]
          routing: [4×2 double]
               rt: [4×4 double]
          rtnodes: [8×8 double]
            rtfun: {}
            sched: {2×1 cell}
          schedid: [2×1 double]
       schedparam: [2×2 double]
             sync: {10×1 cell}
            space: {}
            state: {2×1 cell}
              scv: [2×2 double]
           visits: {[2×2 double]}
        varsparam: {4×1 cell}
   nodeToStateful: [1 2 NaN NaN]
    nodeToStation: [1 2 NaN NaN]
    stationToNode: [1 2]
stationToStateful: [1 2]
   statefulToNode: [1 2] 
vveerapp commented 4 years ago

@line-solver I didn't realize that the https://github.com/line-solver/line utility created the structure of the data in the zip file attached. Simply loading the mat file gave me a a 6X1 column.

I have now cloned the line-solver utility and tested this. The error is happening because the empty data set in the MATLAB object is not handled by the schema generator. I will put in a fix for this and deploy. Thanks for your patience.

gcasale commented 4 years ago

@vveerapp Many thanks for this - great news. Looking forward to try out the new version. Thanks for the help!

gcasale commented 4 years ago

Hi, is it possible that you forgot to push the change? I still get the same problem.

vveerapp commented 4 years ago

Hi Not at all, this is very much in our radar. We are making your data part of our unit tests and in the process of rewriting some parts of the code to handle your use case. We hope to push this out soon, although I apologize I cannot provide a more narrow timeline.

gcasale commented 4 years ago

Sounds great - thank you!

DaveForstot-MathWorks commented 2 years ago

Hi @line-solver @gcasale,

This issue should be resolved in the current release. Can you try pulling/downloading the latest version and verifying the nested empty cell array is preserved?

Thanks!