karamaan / karamaan-opaleye

Other
11 stars 7 forks source link

Incorrect escaping of newlines #53

Open dbp opened 10 years ago

dbp commented 10 years ago

I'm getting incorrect escape behavior on newlines (so a newline character is saved as the literal string "\n", and if you have a textarea posting, hitting save multiple times results in "\n", "\n", etc),

As far as I can tell, the default behavior is to send Text, via ShowConstant, to a StringLit (from HaskellDB). This then gets escaped, which involves turning '\n' (a newline character) into "\n" (the string consisting of a backslash and then the letter n). I can only assume that postgresql-simple, which is being used to run the queries, is saving those strings literally (as it should!), by way of parametrized queries or something, resulting in the bug.

I imagine that the way that HaskellDB is used elsewhere necessitates this behavior? I'm nervous about turning Text into OtherLit, as that makes it seem like it will just be spliced into the query (which would be bad, obviously), so I'm wondering what the solution is, and how other people have dealt with this.

tomjaguarpaw commented 10 years ago

That's an interesting observation. I've never come across this as I've never used ShowConstant with a string containing "special" characters. If you have the time could you post an example of a very simple query (say the Opaleye equivalent of "SELECT 'mystring' from TABLE') where 'mystring' contains a newline and other special characters, and the SQL it generates? It would also be interesting if you could look into how HaskellDB treats strings under SQL generation. I think it will be in here somewhere: https://github.com/tomjaguarpaw/haskelldb/tree/master/src/Database/HaskellDB/Sql -- I will also investigate further myself, but it's very late here now!

(By the way, I don't think HaskellDB's SQL generation actually uses parametrised queries. We should look further into this in case there are security issues.)

dbp commented 10 years ago

Here's a query:

testQuery = proc () -> constant ("hello \n world" :: Text) -< ()

And printing it with the help of the following helper:

showQuery = putStrLn . show . showSqlForPostgresDefault

Results in the following on the terminal (the extra show is to get the string literal, vs that rendered by putStrLn. String escaping is hard :)):

"SELECT 'hello \\n world' as constant_1"

This is in contrast to printing the string as follows:

putStrLn (show ("hello \n world" :: Text))

Which results in:

"hello \n world"

As expected (and yes, this isn't a funny rendering thing. If I get rid of the extra shows, the first case has the literal "\" and "n" in it, whereas the latter prints on two separate lines).

dbp commented 10 years ago

oh, and @tomjaguarpaw I did look at what it does. What appears to be the problem is the call to quote here: https://github.com/tomjaguarpaw/haskelldb/blob/11668fa3b2ad568b429bbbbcc235addf080540cb/src/Database/HaskellDB/Sql/Default.hs#L518 , which calls escape (just below), which has the behavior of turning special characters into strings representing them. Now I'm assuming that this was supposed to counteract a de-escaping (by way concatenating strings as raw SQL) later on, but I don't understand the whole flow well enough yet to say that for sure.

tomjaguarpaw commented 10 years ago

Thanks for doing that research. HaskellDB has a lot of rough edges. It's not unlikely that it's just doing completely the wrong thing. I'll take a look at it either tomorrow or during the week.

tomjaguarpaw commented 10 years ago

Doesn't the example you provided actually show correct behaviour on the part of HaskellDB? The newline should indeed be escaped in the final SQL string. It doesn't seem that it is being escaped twice.

You talked about writing to the database, so I wonder if it comes from one of the manipulation functions. I shall investigate further.

dbp commented 10 years ago

Well, it depends what happens to the string next... If it is passed as-is, then yes, it seems fine. But I assume that postgresql-simple (or another stage along the way) is escaping again... Because what ends up in the database is wrong.

tomjaguarpaw commented 10 years ago

Here's a manipulation example.

> let t = tableOfTableSpecDef (TableSpec "foocolumn" "footable") :: Table (Wire String)
> putStrLn $ show $ arrangeInsertSqlDef t (fmap Just (constant "hello \n world"))
"INSERT INTO footable (foocolumn)\nVALUES ('hello \\n world')"

In the resulting SQL string the newline character has been replaced by backslash (encoded as double backslash) followed by an 'n' character (encoded just as an 'n' character). This is exactly what should happen before the SQL string is sent to Postgres.

So is there something else also escaping? I'm not sure, but will continue looking.

tomjaguarpaw commented 10 years ago

I think this is due to the standand_conforming_strings parameter. If standard_conforming_strings is on then backslashes are interpreted literally. You have to use "escape strings", i.e. strings prefixed by an E, e.g E'hello \n world', to get special characters.

Can you check whether standand_conforming_strings is on or off on your server? (Strangely for me it is on if I issue a request through postgresql-simple, but off if I use pgsql. I don't know how that can happen, but it does seems to be the case.)

dbp commented 10 years ago

It's on for me both via psql and via postgresql-simple (this is a 9.3 server).

So that makes it seem like the problem is that HaskellDB is putting in the characters, expecting them to become special characters, whereas in normal operation with postgresql-simple, the actual newline characters are passed to postgresql, and thus things work fine. To confirm, both of the following result in proper newlines in the DB (and no problems):

execute "insert into table (s) values (?)" (Only "Hello\nWorld" :: Only Text)
execute_ "insert into table (s) values ('Hello\nWorld')"
tomjaguarpaw commented 10 years ago

That sounds like a correct description of the underlying cause. Perhaps you would like to create a branch with a modified escape function and see how well that works. I am wary of making that change to the main repo as it may have unintended consquences.

dbp commented 10 years ago

So I can confirm that by not performing escapes on newline characters (in Database.HaskellDB.Sql.escape, commenting the line that handles newlines) that newlines are going in and coming out of the database properly, when using postgres 9.3 (which has that parameter enabled).

So if that were the only target, I would say that the escaping is a bug that should be fixed. But, I'm not sure if other configurations (or other databases) will be like that, so I wonder if this is the right place to be addressing the problem.

bergmark commented 10 years ago

If we fork/inline haskelldb into opaleye we don't need to care about other backends.

bergmark commented 10 years ago

Is there a branch with a solution for this somewhere?

I just ran across this with quotation marks (inserting json into a text column):

λ> showQuery (constant "[\"astring\"]")
"SELECT '[\\\"astring\\\"]' as constant_1"
tomjaguarpaw commented 10 years ago

Currently there is no patch for this. I think what I will do, unless anyone can see an obvious problem, is to make Opaleye generate only "escape strings", i.e. only strings of the form E'This sentence contains a backslash \\ and a quote \'. The end.'.

Unless someone really needs it I will only fix this in Opaleye 1 which is being released publically on December 1st. You can get earlier access to Opaleye 1 by sending me an email. It's very nice and removes the need for ExprArr :)

tomjaguarpaw commented 10 years ago

Actually since this is HaskellDB functionality it makes sense just to patch it there now. Can someone please confirm that this works: https://github.com/tomjaguarpaw/haskelldb/commit/caf1eac24bbccd67d2603d5788ea2bfa686f96c0

bergmark commented 10 years ago

Yay it works, thanks!

tomjaguarpaw commented 10 years ago

Smallest fix ever!