larsks / iniparse

Automatically exported from code.google.com/p/iniparse
Other
0 stars 0 forks source link

Inaccurate parsing compared to proper Windows functions #24

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
There are a number of issues in parsing which produce inaccurate results when 
compared to normal INI parsing methods (i.e. Windows' *PrivateProfile* 
functions).

Here are a few which I've noticed which are inconsistent with how they "should" 
be parsed for INI files.

Quoted values: if the read value is wrapped in single or double quotes, the 
quotes should be removed, permitting leading and trailing whitespace which 
would otherwise not work. And it is not a "valid string" check, it is purely a 
first and last character check - so Key="Val"ue" should be Val"ue, and 
Key="Value1" "Value2" should be Value1" "Value2.

Comments:
* The semicolon is the only accepted comment character: 
http://msdn.microsoft.com/en-us/library/ms724348%28VS.85%29.aspx, "Comments 
(any line that starts with a semicolon) ..."
* There is no support for inline comments in option lines. I'm not certain 
about section lines though; I didn't test that when I was working directly with 
the PrivateProfile functions, but I expect that they're not allowed.

In short, the only acceptable comment form in INI files is a line starting with 
a semicolon (/^;/, no leading whitespace).

Combining the quotes and comments things, it would simplify parsing immensely 
if you removed support for the inline comments altogether - just 
string.strip(); string[0] == string[-1] and string[0] in ('"', "'") and 
string[1:-1] or string. But I quite understand that you won't want to do it 
that way seeing as people use this for non-INI files with other comment 
characters and inline comments and things like that. A better solution will be 
attributes like "comment_chars = ';'", "inline_comments = False" and 
"unquote_values = True" in your INIDialect you talked of in issue 23.

Examples of erroneous INI parsing:

  [Section]
  A="value"
  B = ' value'
  C=a ;b

+-----+----------+----------+
| Key | Expected | Actual   |
+-----+----------+----------+
|  A  | value    | "value"  |
|  B  |  value   | " value" |
|  C  | a ;b     | a        |
+-----+----------+----------+

Thanks for this library, it's handy in the ways it's better than ConfigParser 
:-)

Original issue reported on code.google.com by chris.morganiser on 23 Jun 2010 at 11:51

GoogleCodeExporter commented 9 years ago
Just another thing I'll add to this now; INIConfig's __init__ has one of its 
parameters optionxformvalue=lower - which means that when fp is given, it 
converts all the keys to lowercase. This can be overridden with 
INIConfig(thingummy, optionxformvalue=None), but for an INI file it's just 
plain bad - combining this with case sensitivity was just breaking my use 
utterly where most values start with an uppercase letter and I'd written all my 
validation like that too. For real INI files on Windows it's quite normal to 
use mixed case, normally camel case. Whereas with .cfg files it's more normal 
to use lowercase sections (I note that sections aren't converted to 
lowercase..?) and keys, all snake case.

This time though I don't think it's really an INIDialect matter. Converting 
values to lowercase when read, but not when working with it (e.g. setting 
ini.Section.Key sets Key, not key) is counter-productive. And when it comes to 
writing, it would break things in my application and in any others which happen 
to have any upper case key names - they'd lose their upper case. 
optionxformvalue=None would be a far better default.

With all of this (including trying making a system for dialects) I'm willing to 
help - if you want to email me to discuss it, you should be able to work out my 
email address from my account name.

Original comment by chris.morganiser on 26 Jun 2010 at 6:46

GoogleCodeExporter commented 9 years ago
Patches are most welcome.  I have a three month old at home, and so the time I 
can spend on iniparse these days is very limited.  Use the 
iniparse-discuss@googlegroups.com mailing list to hash out the design or to ask 
questions.

Regarding the case insensitivity for option names: I wasn't able to reproduce 
the problem... It may be that I haven't understood exactly what you meant.  Can 
you post a short code snippet demonstrating the issue?

Original comment by psobe...@gmail.com on 27 Jun 2010 at 9:41

GoogleCodeExporter commented 9 years ago
I just went over the case insensitivity thing again and couldn't find it for a 
while, using it directly, but I finally worked out what it was. Accessing 
[Section]:Key in the INIConfig worked, but due to the optionxformvalue=lower, 
when accessing the INISection as an iterable - e.g. set(ini.Section) - returned 
set(['key']) rather than set(['Key']), which was breaking it for me as I was 
then doing set differences for validation purposes.

Test case:

>>> import iniparse, StringIO
>>> list(iniparse.INIConfig(StringIO.StringIO('[Section]\nKey=value')).Section)
['Key']
# Error: actual value will be ['key'] rather than ['Key']

Sorry I got you to go to the trouble of trying to reproduce it before... I 
think /I/ didn't understand exactly what I meant, because I was using it that 
way. I just knew that my validator was reporting "required value Foo is 
missing" and "extra invalid value foo" and whatnot.

(Since writing the patch below, one thing has occurred to me; I think this will 
remove case sensitivity in accessing keys; currently the section is case 
sensitive but the key is insensitive. A better fix may be needed if you want to 
retain case sensitivity but still get the iterator reporting the correct thing.)

Patch to fix it:

--- a/iniparse/ini.py
+++ b/iniparse/ini.py
@@ -439,10 +439,6 @@
         yield line

-def lower(x):
-    return x.lower()
-
-
 class INIConfig(config.ConfigNamespace):
     _data = None
     _sections = None
@@ -454,7 +450,7 @@
     _parse_exc = None
     _bom = False
     def __init__(self, fp=None, defaults=None, parse_exc=True,
-                 optionxformvalue=lower, optionxformsource=None,
+                 optionxformvalue=None, optionxformsource=None,
                  sectionxformvalue=None, sectionxformsource=None):
         self._data = LineContainer()
         self._parse_exc = parse_exc

Original comment by chris.morganiser on 28 Jun 2010 at 11:32

GoogleCodeExporter commented 9 years ago
Ah, I see.  I think iteration should preserve case, so this is a bug.

The fix is more complicated than your patch though, because case insensitive 
lookup needs to work for ConfigParser compatibility.

Original comment by psobe...@gmail.com on 30 Jun 2010 at 3:08