manoelcampos / xml2lua

XML Parser written entirely in Lua that works for Lua 5.1+. Convert XML to and from Lua Tables 🌖💱
MIT License
287 stars 73 forks source link

Question: can you "reset" the parser? #29

Closed sec23206 closed 4 years ago

sec23206 commented 4 years ago

Full disclosure that I'm new to Lua, but am trying to write an NGINX plug-in to parse a response from an XML web service and convert it to JSON. I found the xml2lua library and it seems to work great when I run it interactively. But when running within NGINX, it appears to retain "state" from previous requests. By that, I mean that the Lua table which this code generates seems to include the data from all previous requests rather than just that of the current request. This may be a function of how NGINX executes Lua code, but my question is whether there's an option with the xml2lua library to "clear" or reset the parser? This is the way I'm trying to use the library:

local xml2lua = require("xml2lua")
local handler = require("xmlhandler.tree")
local rapidjson = require('rapidjson')

local _M = {}

local function read_xml_body(xml_body)
  if xml_body then
    local parser = xml2lua.parser(handler)
    parser:parse(xml_body)
    return handler.root
  end
end

function _M.transform_xml_body(conf, buffered_data)
  local xml_as_lua = read_xml_body(buffered_data)
  if xml_as_lua == nil then
    return
  end
  return rapidjson.encode(xml_as_lua)
end

return _M

On the first request, the contents of handler.root is what I would expect:

{
  string = {
    "example@foo.com",
    _attr = {
      xmlns = "http://foo.com.BARRSD/"
    }
  }
}

But on the next request, it seems to append to the contents of the table:

{
  string = {
    "example@foo.com",
    {
      "example@foo.com",
      _attr = {
        xmlns = "http://foo.com.BARRSD/"
      }
    },
    _attr = {
      xmlns = "http://foo.com.BARRSD/"
    }
  }
}

And this just continues until NGINX is restarted. Thanks!

sec23206 commented 4 years ago

I thought I had found a workaround, but now for reason it's not ... before creating the parser, I inserted:

handler = handler.new()

This seemed to give me a "clean slate" for subsequent NGINX requests invoking this code. But further testing, and it's instead giving me an error about an incomplete XML document:

/usr/local/share/lua/5.1/xml2lua.lua:92: Incomplete XML Document [char=111]

stack traceback:
    [C]: in function 'error'
    /usr/local/share/lua/5.1/xml2lua.lua:92: in function 'errorHandler'
    /usr/local/share/lua/5.1/XmlParser.lua:133: in function 'err'
    /usr/local/share/lua/5.1/XmlParser.lua:371: in function 'getNextTag'
    /usr/local/share/lua/5.1/XmlParser.lua:418: in function 'parse'
    ...ugins/response-transformer-xml2json/body_transformer.lua:15: in function 'read_xml_body'
    ...ugins/response-transformer-xml2json/body_transformer.lua:24: in function 'transform_xml_body'
    ...m/kong/plugins/response-transformer-xml2json/handler.lua:48: in function <...m/kong/plugins/response-transformer-xml2json/handler.lua:35>
    /usr/local/share/lua/5.1/kong/init.lua:209: in function 'execute_plugins_iterator'
    /usr/local/share/lua/5.1/kong/init.lua:982: in function 'body_filter'

I've confirmed that the string it's trying to parse is fully-formed XML. Taking out the call to handler.new() removes this error, but it's back to appending content into the Lua table. This is confusing since the comments in the source code for xmlhandler.tree makes it seem .new() is exactly the right thing to use:

---Instantiates a new handler object.
--Each instance can handle a single XML.
--By using such a constructor, you can parse
--multiple XML files in the same application.
--@return the handler instance
function tree:new()
    local obj = init()

    obj.__index = self
    setmetatable(obj, self)

    return obj
end
manoelcampos commented 4 years ago

Try handler = handler:new(). If it doesn't work, define the handler as a local variable, moving local handler = require("xmlhandler.tree") inside the place you want to use it.

sec23206 commented 4 years ago

I had earlier tried your second suggestion of moving the declaration of handler into the scope of function, but that didn't resolve it; I found some references to Lua running in NGINX that seemed to suggest that variable scope is somewhat trickier, particularly when using 'require', but I wasn't entirely sure if that was the situation here.

But your first suggestion seemed to do the trick! Which seems to lend more credence to Lua-in-NGINX requires a different approach (that you have to reset local variables to a known value, and not assume they aren't hanging around from processing a previous request).

I assume (again, being a Lua newbie) that the old instance of handler will be garbage collected because it's been dereferenced and I won't run into any memory leak issues. But looks good so far, and thanks so much for your response! Your xml2lua solution really saved the day! :-)

manoelcampos commented 4 years ago

The old reference will be garbage collected, so you don't need to worry. You're welcome.