scrapinghub / js2xml

Convert Javascript code to an XML document
MIT License
186 stars 23 forks source link

Simply XML schema for objects #4

Closed redapple closed 10 years ago

redapple commented 10 years ago

Old schema:

t = {a: "3", "b": 3, "3": 3.0};
<program>
  <assign>
    <left>
      <identifier>t</identifier>
    </left>
    <operator>=</operator>
    <right>
      <object>
        <assign>
          <left>
            <identifier>a</identifier>
          </left>
          <operator>:</operator>
          <right>
            <string>3</string>
          </right>
        </assign>
        <assign>
          <left>
            <string>b</string>
          </left>
          <operator>:</operator>
          <right>
            <number>3</number>
          </right>
        </assign>
        <assign>
          <left>
            <string>3</string>
          </left>
          <operator>:</operator>
          <right>
            <number>3.0</number>
          </right>
        </assign>
      </object>
    </right>
  </assign>
</program>

New schema:

<program>
  <assign>
    <left>
      <identifier>t</identifier>
    </left>
    <operator>=</operator>
    <right>
      <object>
        <property name="a">
          <string>3</string>
        </property>
        <property name="b">
          <number>3</number>
        </property>
        <property name="3">
          <number>3.0</number>
        </property>
      </object>
    </right>
  </assign>
</program>
redapple commented 10 years ago

@rocioar, @dangra, what do you think?

I though about adding a type attribute and having only text nodes under <property> for string, number, boolean and the other basic types.

The problem is with the other cases, when values are complex. e.g.

t = {a: "3", "b": 3.14, "3": Math.random()};
<program>
  <assign>
    <left>
      <identifier>t</identifier>
    </left>
    <operator>=</operator>
    <right>
      <object>
        <property name="a">
          <string>3</string>
        </property>
        <property name="b">
          <number>3.14</number>
        </property>
        <property name="3">
          <functioncall>
            <identifier>
              <dotaccessor>
                <object>
                  <identifier>Math</identifier>
                </object>
                <property>
                  <identifier>random</identifier>
                </property>
              </dotaccessor>
            </identifier>
            <arguments/>
          </functioncall>
        </property>
      </object>
    </right>
  </assign>
</program>
dangra commented 10 years ago

I prefer the left side and the operator to be attributes of tag. that way everything within the tag is the right side (that can be complex)

redapple commented 10 years ago

@dangra , isn't that the case (operator for object is always ":")? do you have an example?

dangra commented 10 years ago

now I realize the issue is about "objects", I was talking about assignations like t = object expanding to:

<assign left="t" operator="=">object</assign>

ofc, the assignation operator doesn't make sense.

dangra commented 10 years ago

In your example after applying this PR and with my proposed changes it would be:

<program>
  <assign left="t">
      <object>
        <property name="a">
          <string>3</string>
        </property>
        <property name="b">
          <number>3</number>
        </property>
        <property name="3">
          <number>3.0</number>
        </property>
      </object>
  </assign>
</program>
dangra commented 10 years ago

how much complex can left side be except for indexed arrays?

redapple commented 10 years ago

This needs another PR to discuss, but assignments could be complex:

var myObj = {};
myObj["a"] = 1;
myObj.a = 32;
myObj.a = {"begin": 0, "end": 32};
myObj.a["end"] = 64;
var b = "a";
myObj[b].begin = -64;
dangra commented 10 years ago

for the purpose of querying for data extraction, you dont mind about the differences of myObj["a"] vs myObj.a, in fact if a single query can match both it is better as the developer behind the JS code can change it at any time but the querypath wont change.

I think the really complex case happens when the indexed key is a complex expression: myObj[get_key_name(foo)] = {}

dangra commented 10 years ago

It's clear I don't mind for the purity of the translated output, instead I focus on outputting with ease data extraction in mind.

redapple commented 10 years ago

Then it probably means a layer of interpretation on top of the parser verbose output

On Mon, May 19, 2014 at 4:49 PM, Daniel Graña notifications@github.comwrote:

It's clear I don't mind for the purity of the translated output, instead I focus on outputting with ease data extraction in mind.

— Reply to this email directly or view it on GitHubhttps://github.com/redapple/js2xml/pull/4#issuecomment-43513526 .

dangra commented 10 years ago

Then it probably means a layer of interpretation on top of the parser verbose output

Yes, makes sense, an XSL focused on data extraction on top of the verbose output, that can be part of js2xml or other projects like Scrapy.