skarim / vobject

A full-featured Python package for parsing and creating iCalendar and vCard files
https://vobject.sameenkarim.com
255 stars 93 forks source link

Failed to parse line #152

Open jkirk opened 5 years ago

jkirk commented 5 years ago

My Android phone exports my contact (list) / address book into a vcf file. Some of them have entries like this (an "utf-8 encoded printable" string on its own) where vobjects fails to parse the extra line:

% pip list | grep vobject
vobject                               0.9.6.1

% python
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import vobject
>>> t = """
... BEGIN:VCARD
... VERSION:2.1
... N:Muster;Max;;;
... FN:Max Muster
... ADR;WORK;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:;;=65=66=
... =31=37=30=31=;;;;
... END:VCARD
... """
>>> v = vobject.readOne(t)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jkirk/software/venv3/lib/python3.5/site-packages/vobject/base.py", line 1156, in readOne
    allowQP))
  File "/home/jkirk/software/venv3/lib/python3.5/site-packages/vobject/base.py", line 1101, in readComponents
    vline = textLineToContentLine(line, n)
  File "/home/jkirk/software/venv3/lib/python3.5/site-packages/vobject/base.py", line 925, in textLineToContentLine
    return ContentLine(*parseLine(text, n), **{'encoded': True,
  File "/home/jkirk/software/venv3/lib/python3.5/site-packages/vobject/base.py", line 813, in parseLine
    raise ParseError("Failed to parse line: {0!s}".format(line), lineNumber)
vobject.base.ParseError: At line 7: Failed to parse line: =31=37=30=31=;;;;

Please note that the (source) address field has no newline (and if it had one it would have been quoted with =0A=) and that this example is simplified, the actual line gets broken after about 23 "utf quotes" (some characters consists of two "utf quotes" like ß = =C3=9F=).

"Merging" the lines (I removed the leading and the last = of the extra line) like this lead to the following:

>>> m = """
... BEGIN:VCARD
... VERSION:2.1
... N:Muster;Max;;;
... FN:Max Muster
... ADR;WORK;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:;;=65=66=31=37=30=31;;;;
... END:VCARD
... """
>>> v.readOne(m)
>>> v.prettyPrint()                                                                                   
 VCARD
    N:  Max  Muster
    FN: Max Muster
    VERSION: 2.1
    ADR: ef1701
,
    params for  ADR:
       CHARSET ['UTF-8']

Please notice the , in the extra line. Why is that?

I've copy/pasted and anonymized the content from the vcf which was in "fileformat" dos. The same happens if read the file. I am using Debian/stretch and running vobject in pip environment.

pudo commented 5 years ago

We're seeing the same issue with a lot of VCards in the wild.

olafhering commented 4 years ago

Does allowQP=True fix the parse error?

jkirk commented 4 years ago

@olafhering Well, kind of 'better':

% python
Python 3.7.3 (default, Dec 20 2019, 18:57:59) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import vobject
>>> t = """
... BEGIN:VCARD
... VERSION:2.1
... N:Muster;Max;;;
... FN:Max Muster
... ADR;WORK;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:;;=65=66=
... =31=37=30=31=;;;;
... END:VCARD
... """
>>> v = vobject.readOne(t, allowQP = True)                                                                                                                                                                        
>>> v.prettyPrint()
 VCARD
    VERSION: 2.1
    N:  Max  Muster
    FN: Max Muster
    ADR: ef1701=
,
    params for  ADR:
       CHARSET ['UTF-8']

The output is quite similar to what happened when I manually merged the line, but here a = sign is appended to the address line. And still , is added in new line.

(Somehow unrelated to this issue, but worth mentioning: N: Max Muster consists of two white spaces between first name and surname and between N: and first name. Yes, Gitlab only shows one white space in between in the previous sentence, see the code above.)

p30arena commented 2 years ago

Does allowQP=True fix the parse error?

At line 45: Failed to parse line: =DB=8C

I had a similar problem and this parameter fixed it

thanks!