swl10 / pyslet

Python for Standards in Learning Education & Training
Other
73 stars 34 forks source link

Possible bug in parsing AssociationSet names #18

Closed chebizarro closed 7 years ago

chebizarro commented 10 years ago

Hi again, I am attempting to parse a rather large metadata file (2.4Mb) from SAP and the parser keeps choking when attempting to declare an AssociationSet in the EntityContainer namespace.

I should state at the outset that I am unable to independently verify that the metadata being returned is valid, but I would assume that it is, coming from the SAP HANA service. I would put a copy of the file online but I'm not sure if that would be violating the terms and conditions of the service agreement. It's free to register if you're so inclined.

The error the parser throws is a DuplicateName error and the AssociationSet that it throws this error on has the associationName attribute of "SFOData.Background_Picklist-One_AddressType_picklist"

Inspecting the code in pudb, I saw that this was happening because the RegEx expression in csdl.py excludes the dash character by only looking for "Pc". This causes the entity's name to be left as "Default" and the above error to be thrown.

AFAIK, the dash is a legal character in entity names, but I could be wrong. I'm trying to find a reference but RFCs make my head hurt.

In any case, I amended the RegEx to include "Pd" and it continues parsing until... it throws a ModelIncomplete error.

This is about as far as I have gotten before my eyes starting bleeding. I will look into it further and add to the above.

swl10 commented 10 years ago

On 22 Oct 2014, at 00:24, Chris Daley notifications@github.com wrote:

Hi again, I am attempting to parse a rather large metadata file https://sfsfbizxtrial.hana.ondemand.com/odata/v2/%24metadata (2.4Mb) from SAP and the parser keeps choking when attempting to declare an AssociationSet in the EntityContainer namespace.

I should state at the outset that I am unable to independently verify that the metadata being returned is valid, but I would assume that it is, coming from the SAP HANA service. I would put a copy of the file online but I'm not sure if that would be violating the terms and conditions of the service agreement. It's free to register if you're so inclined.

The error the parser throws is a DuplicateName error https://github.com/swl10/pyslet/blob/master/pyslet/odata2/csdl.py#L458 and the AssociationSet that it throws this error on has the associationName attribute of "SFOData.Background_Picklist-One_AddressType_picklist”

Inspecting the code in pudb, I saw that this was happening because the RegEx expression https://github.com/swl10/pyslet/blob/master/pyslet/odata2/csdl.py#L47 in csdl.py excludes the dash character by only looking for "Pc". This causes the entity's name to be left as "Default" and the above error to be thrown.

AFAIK, the dash is a legal character in entity names, but I could be wrong. I'm trying to find a reference but RFCs make my head hurt.

According to my version of CSDL (OData v2/3):

AssociationSet MUST have a Name attribute defined. Name attributes are of type SimpleIdentifier.

2.2.6 SimpleIdentifier SimpleIdentifier is a string-based representation. The maximum length of the identifier MUST be less than 480. The following pattern represents the allowed identifiers in the ECMA specification, [ECMA-334] sections 9.4.2 and A.1.6. value="[\p{L}\p{Nl}][\p{L}\p{Nl}\p{Nd}\p{Mn}\p{Mc}\p{Pc}\p{Cf}]{0,}”

That’s the origin of the regular expression in Pyslet. So it looks like SAP’s service is wrong unfortunately.

In any case, I amended the RegEx to include "Pd" and it continues parsing until... it throws a ModelIncomplete https://github.com/swl10/pyslet/blob/master/pyslet/odata2/csdl.py#L4299 error.

Relaxing these constraints to support a broken implementation feels like the right path I guess. I can see why ECMA would exclude hyphen but I guess OData always escapes names in JSON so the definition probably is over-restrictive. (CSDL was borrowed from .Net if my understanding is correct so I guess identifiers that can also be used as variable names was the driver here.)

Anyway, I did download the metadata file (I already have an SAP account) and hit the error you found too:

pyslet.odata2.csdl.ModelIncomplete: Navigation property ethnicityNav in EntitySet User is not bound to an association set

This is about as far as I have gotten before my eyes starting bleeding. I will look into it further and add to the above.

So now my eyes are bleeding too! This model uses a general concept of PickLists and PickListOptions, which seems to be explained here: http://scn.sap.com/community/erp/hcm/blog/2013/07/09/rules-and-picklists-in-the-successfactors-metadata-framework http://scn.sap.com/community/erp/hcm/blog/2013/07/09/rules-and-picklists-in-the-successfactors-metadata-framework

In the metadata model there is a single Association defined for the relationship between a User and PickListOption record, however that relationship is used over and over for lots of navigation properties in the User entity.

I was thinking that a PickListOption is a special type of application-wide enumeration really and the model is using navigation properties rather than simple values in a strange sort of normalisation.

The problem is likely that I have assumed that each Association is only used once to relate the same two entity sets and hence there should be only one matching AssociationSet (otherwise the model would be ambiguous.)

This error needs a closer look, something is going to have to relax I can see. Are you using the model from the point of view of a client? If so, the simplest thing is probably to relax some constraints during this post-check phase of the model loading. I doubt you’ll hit a problem with missing association sets when consuming the model as they are really hints to providers about required index storage.

On the other hand, if you intend to use the model to create a similar service, e.g., in a local database, then you will get serious issues and the best thing to do would be fix the model. It may also be necessary to change Pyslet to auto-generate missing AssociationSets (e.g., from a mangled form of the Entity Set+Navigation property names) - for 1-1 relationships they get turned in to separate tables.

— Reply to this email directly or view it on GitHub https://github.com/swl10/pyslet/issues/18.

Steve

swl10 commented 10 years ago

OK, the latest check-in should at least allow you to load the metadata document. I've added code to automatically generate association sets when they are missing (or ambiguous) and I've softened the rather cryptic "unbound principal" error, I now just log the error and silently modify the multiplicity of the offending association. This works around models which appear to declare 'contained' entity types that must be contained by more than one entity. Hopefully the model will be usable from the client perspective now but if you try and generate a SQL database from it I don't think it will generate a very sensible schema!

I'll leave this issue open until I hear if this resolves the problem or not.

swl10 commented 7 years ago

I note that OData v4 contains some provisions to help people who are modelling this type of schema. Given the age of this issue I'm closing this one now as I don't see this being fully resolved in the OData v2 stack.