oeed / CraftOS-Standards

Community standard file formats, communication systems, etc. for ComputerCraft and CraftOS 2.0
Other
20 stars 14 forks source link

Lua table format #41

Open oeed opened 8 years ago

oeed commented 8 years ago

As discussed in #26 and #18, the Lua format is being split in to separate types.

This format is exclusively for files that are loadable via textutils.unserialise.

Also see: #39 and #40.

viluon commented 8 years ago

It would be good to point out that the only thing textutils.unserialise does is add a return statement to the beginning of the string and loadstring that. This means that strings like ""some text"", "6/3" or "function() end" yield no error.

lyqyd commented 8 years ago

I think that the removal of the unnecessary emphasis on "entire" and "parseable", along with changing "parseable" to "able to be parsed" would be sufficient for an initial standard acceptance. Does anyone see any other changes that would be necessary to clearly convey the requirements for the file format?

viluon commented 8 years ago

I am not sure @lyqyd. As I pointed out earlier, textutils.unserialise can be used for much more than just loading tables. The question is, does this standard apply to tables only, or anything that compiles with return in front in an empty environment?

SquidDev commented 8 years ago

It might be worth saying that function calls or definitions are invalid, or alternatively the only valid expressions are tables, strings, numbers, boolean values and nil.

viluon commented 8 years ago

@SquidDev in what sense?

-- Is this allowed?
{
  [ 1 ] = function( x ) return x + 1 end;
  [ 2 ] = function( x ) return x + 2 end;
}
-- What about
{ true, false, true, true }
-- Or simply
true
SquidDev commented 8 years ago

I'd say only the middle is legal: the first is more of an API rather than data and the last isn't a table.

viluon commented 8 years ago

I'd say only the middle is legal: the first is more of an API rather than data and the last isn't a table.

Disagreed @SquidDev. This needs proper, well defined rules. Just because you can't serialise it doesn't mean it's not a valid table, and this is exactly the case. Plus, these are by no means the only edge cases. See for yourself:

-- Linters don't like a Lua file starting with {, so this is what I do to make my serialised tables lintable
(function() return
{
  normal = "table follows"
}
end)()
{
  -- Maths! Sometimes you want to keep fractions easy to read
  width = 2/5;
  -- But what about
  height = 0/0;
}
-- The global environment may be empty, but strings still have their metatables
{
  ("asd"):byte();
}
viluon commented 8 years ago

The standard proposal as it is now is broad, incomplete, doesn't discuss edge cases and defines a format that can still be exploited while still differing from valid Lua syntax (therefore Lua source code tools can not be used on it (efficiently)).

lyqyd commented 8 years ago

I would say that any file that results in textutils.unserialize returning a table when it is fed the file contents would be valid, regardless of how it achieves that result. So, out of your six examples, only the third one would fall outside of this format.

This does leave us open to arbitrary code execution when attempting to load the file contents. I think we need to decide whether or not that's actually something that this standard should attempt to mitigate. I don't personally see a need to disallow it, but I'm interested in hearing other thoughts on the matter.

oeed commented 8 years ago

The point of the file format is to essentially have an easy way of storing data in a file using textutuils.

As, I would imagine, most people would read the file like this, as long as it can be understood by textutils.unserialise, it should be valid. Yes, this means it doesn't have to be a table.

local h = fs.open("file/path", "r")
if h then
    local data = textutils.unserialise(h.readAll())
    if data then
        ...
    end
end
ghost commented 8 years ago

My oppinion is that the lua format should just be spread into executeable and data.

CrazyPyroEagle commented 8 years ago

As Lyqyd said, this is vulnerable to arbitrary code execution. Let me give one such example.

Imagine a Computercraft server on which an in-game public server can be reconfigured by sending it a new .tbl file. Now, this is fine, until someone manages to bypass enough security measures to update its configuration. From there, they could do whatever they wanted on that server, such as write a script that deletes all files: (function() for _, file in ipairs(fs.list("")) do if not fs.isReadOnly(file) then fs.delete(file) end end return "Goodbye" end)(). When textutils.unserialize prefixes it with return and loads it, all the server files will be gone. This is even worse if it's a program distribution server as they could edit the actual code that transmits the programs across the network to create an in-game botnet or infect everyone's computers with viruses. All because of this one flaw in the standard.

There are three solutions to this.

  1. You ensure that load loads the function with a specially sandboxed environment. Per implementation, this could be abused, therefore it may not be a good idea as people who don't read that note saying "don't use textutils.serialize" will still open up the security flaw.
  2. You either write a function that checks if the contents of the file are valid (tables only?) or you require a different loading method, such as a custom function that uses a tree-like table structure to read and write files. Even with this in place, people could be lazy and just use textutils.serialize anyway. You could prevent the use of textutils.serialize by forcing a certain part of the table file to include text that does not compile in order for it to be valid for the format. For example, format version details (to allow you to expand the format if required). People could still cut this section off and use load, but they definitely won't use textutils.
  3. Define a new non-executable format. People may not like this as this requires custom read / write methods.
viluon commented 8 years ago

Incorrect, @CrazyPyroEagle. textutils.serialise sets the function's environment to an empty table.