zimmerst / phoshare

Automatically exported from code.google.com/p/phoshare
Other
0 stars 0 forks source link

iPhoto 11 Library XML parsing makes proshare crash #9

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Launch proshare
2. Use AlbumData.xml (attached file to this issue)
3. It crashes

What is the expected output? What do you see instead?

This is the log file :

Reading iPhoto database from /Users/czj/Pictures/iPhoto Library...
Error: /Users/czj/Pictures/iPhoto Library/AlbumData.xml:245600:28: not 
well-formed (invalid token)

Traceback (most recent call last):
  File "/Users/czj/Downloads/Phoshare.app/Contents/Resources/lib/python2.7/phoshare/phoshare_ui.py", line 664, in export_thread
  File "/Users/czj/Downloads/Phoshare.app/Contents/Resources/lib/python2.7/appledata/iphotodata.py", line 509, in get_iphoto_data
  File "/Users/czj/Downloads/Phoshare.app/Contents/Resources/lib/python2.7/appledata/applexml.py", line 133, in read_applexml
  File "xml/sax/expatreader.pyc", line 107, in parse
  File "xml/sax/xmlreader.pyc", line 123, in parse
  File "xml/sax/expatreader.pyc", line 211, in feed
  File "xml/sax/handler.pyc", line 38, in fatalError
SAXParseException: /Users/czj/Pictures/iPhoto Library/AlbumData.xml:245600:28: 
not well-formed (invalid token)

What version of the product are you using? On what operating system?
- proshare 1.3.5
- OSX 10.6.5
- iPhoto 11 (with a reconstructed library via ALT + CMD when launching iPhoto)

I have some JPEG 2000 files in my library, but I doubt it is what make proshare 
crash.

Thanks for your help !

Original issue reported on code.google.com by clement.joubert on 22 Nov 2010 at 11:26

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for including the AlbumData.xml file - that makes debugging this very 
easy. I took the liberty of deleting the file after looking at it, because it 
contains some personal data.

What is happening here (and something similar has happened to another Phoshare 
user before) is that iPhoto actually writes invalid XML. In your case, it is 
for the image
   2003/Clément & Pierre-Antoine vont à Ibiza !/096.jpg
and about 6 others. The line with the problem looks like this:
   <key>ImageType</key><string></string>
What you cannot see is four invisible bytes with the value 0 between <string> 
and </string>. It should say JPEG. Phoshare doesn't need that field, but the 0 
bytes make this invalid XML, and reading and parsing of the file contents stops 
at that point on the XML level. I've attached a file that shows all the places 
where this occurs. Maybe you can see a pattern, e.g. do these images have 
something in common.

There is no easy way to work around this in Phoshare. When I came across this 
with another user before, it was about unprintable characters in event 
descriptions. The user was able to fix it by editing the descriptions to erase 
the bad characters. In your case, the bad characters are in data owned by 
iPhoto. Short of deleting those images from your library, there is nothing you 
can do on your end.

I will check if it is possible to add a pre-processing stage for the 
AlbumData.xml file into Phoshare that simply removes any unprintable 
characters. Stay tuned.

Original comment by tsporkert on 22 Nov 2010 at 6:18

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Thanks for the diagnostic. I've removed the pictures from my Library and 
re-added them without bad data.
Proshare now works very well.

Original comment by clement.joubert on 23 Nov 2010 at 9:30

GoogleCodeExporter commented 9 years ago
Great! 

Original comment by tsporkert on 23 Nov 2010 at 5:20

GoogleCodeExporter commented 9 years ago
Thanks again :-)

Original comment by clement.joubert on 23 Nov 2010 at 5:24

GoogleCodeExporter commented 9 years ago
I had this problem with about 25 files. I was finding them one by one by 
rerunning Phoshare, which became tedious. Opening AlbumData.xml in emacs I 
could see the bad characters displayed as ^@^@^@^@. I copied and searched for 
this string to fix them all at once, by looking for the corresponding photo and 
exporting then deleting it. (I have yet to try reimporting the offending 
photos.)

Original comment by william....@gmail.com on 16 Dec 2010 at 5:48

GoogleCodeExporter commented 9 years ago
I guess this is more common that I thought. I'll consider some options to work 
around this.

Original comment by tsporkert on 16 Dec 2010 at 8:15

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Same problem happens for me too, my iPhoto 11 generated AlbumData.xml has 
unreadable hidden characters.  (Using Phoshare 1.3.6 as 1.4.2 crashes my 10.6.6 
on a MacBookPro 1,1 (32-bit Core Duo) )

My problem is the same as the one in Tilman's response (i.e. four invisible 
characters) between <string></string>
<key>ImageType</key><string></string>

Fixing AlbumData.xml (find-replacing the hidden characters) fixes the problem.

This is the error message:

Reading iPhoto database from /Users/username/Pictures/iPhoto Library...
Error: /Users/username/Pictures/iPhoto Library/AlbumData.xml:1922783:28: not 
well-formed (invalid token)

Traceback (most recent call last):
  File "/Applications/Phoshare.app/Contents/Resources/lib/python2.7/phoshare/phoshare_ui.py", line 665, in export_thread
  File "/Applications/Phoshare.app/Contents/Resources/lib/python2.7/appledata/iphotodata.py", line 544, in get_iphoto_data
  File "/Applications/Phoshare.app/Contents/Resources/lib/python2.7/appledata/applexml.py", line 142, in read_applexml
  File "xml/sax/expatreader.pyc", line 107, in parse
  File "xml/sax/xmlreader.pyc", line 123, in parse
  File "xml/sax/expatreader.pyc", line 211, in feed
  File "xml/sax/handler.pyc", line 38, in fatalError
SAXParseException: /Users/username/Pictures/iPhoto 
Library/AlbumData.xml:1922783:28: not well-formed (invalid token)

Original comment by sichunlam on 9 Jan 2011 at 1:16

GoogleCodeExporter commented 9 years ago
Could you please try Phoshare 1.4.3 and let me know if that works for you? It 
contains some code that removes the invisible characters in the AlbumData.xml 
file before trying to parse it.

http://code.google.com/p/phoshare/downloads/detail?name=Phoshare-1.4.3.zip

Original comment by tsporkert on 9 Jan 2011 at 7:38

GoogleCodeExporter commented 9 years ago
Phoshare 1.4.3 fixes the problem –– thank you very much, it certainly 
removes the invisible characters for me successfully, and enables a full export.

By the way, is there any reason why iPhoto and Phoshare reports different 
numbers of photos?  Phoshare says there's 52,809 photos detected in 
AlbumData.xml; compared to 53,136 in iPhoto 9.1.1.  How might I go about 
diagnosing and resolving this?  (I wonder if this is an iPhoto AlbumData.xml 
problem rather than a Phoshare problem?)

NB -- This isn't critical for my current usage - which is to use the link mode 
to modify the iPhoto source images, but I would imagine I and others would want 
to know if the resulting export would be missing photographs.  Thank you for 
any help.

Original comment by sichunlam on 9 Jan 2011 at 2:57

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for testing.

Interesting observation about the photo count difference. In my case, Phoshare 
shows about 154 images *more* than iPhoto, but I've never noticed an actual 
discrepancy in the exported images in years of using Phoshare. 

If you make an album from all your images, and export that, using links and no 
metadata (to make it go really fast and not use much space), how many images do 
you get in the album, and how many are exported?

There is another place in the AlbumData.xml file that contains the list of 
images that you see when you click on "Photos". In my case, that actually 
agrees with iPhoto in my case. I'll put a test in to compare that list to the 
master image list if you run in verbose mode.

Original comment by tsporkert on 9 Jan 2011 at 6:08

GoogleCodeExporter commented 9 years ago
I've created a new issue to track the photo count difference (link below). 
Please add yourself to the CC list of that issue if you are affected and would 
like to receive updates.

http://code.google.com/p/phoshare/issues/detail?id=17

Original comment by tsporkert on 9 Jan 2011 at 6:12

GoogleCodeExporter commented 9 years ago
Issue looks to be back in 1.45

Reading iPhoto database from /Users/andrewpym/Pictures/iPhoto Library...
Error: <unknown>:39128:8: not well-formed (invalid token)

Traceback (most recent call last):
  File "/Applications/Phoshare.app/Contents/Resources/lib/python2.7/phoshare/phoshare_ui.py", line 888, in export_thread
  File "/Applications/Phoshare.app/Contents/Resources/lib/python2.7/appledata/iphotodata.py", line 548, in get_iphoto_data
  File "/Applications/Phoshare.app/Contents/Resources/lib/python2.7/appledata/applexml.py", line 152, in read_applexml_fixed
  File "/Applications/Phoshare.app/Contents/Resources/lib/python2.7/appledata/applexml.py", line 160, in read_applexml_string
  File "xml/sax/__init__.pyc", line 49, in parseString
  File "xml/sax/expatreader.pyc", line 107, in parse
  File "xml/sax/xmlreader.pyc", line 123, in parse
  File "xml/sax/expatreader.pyc", line 211, in feed
  File "xml/sax/handler.pyc", line 38, in fatalError
SAXParseException: <unknown>:39128:8: not well-formed (invalid token)

Ran  : sed -n '39128p' AlbumData.xml
Result  : <key>Roll</key>

Restarted iPhoto 11 (with a reconstructed library via ALT + CMD)

Error: <unknown>:39949:8: not well-formed (invalid token)
Ran  : sed -n '39949p' AlbumData.xml 
Result  : <string>4.2</string>

Can anyone suggest the unix command or shell command line that can fix this?

Original comment by aje...@gmail.com on 3 Apr 2011 at 1:13

GoogleCodeExporter commented 9 years ago
You won't actually be able to see the bad characters that way, because they 
aren't printable. This looks perfectly OK:

$ sed -n '3497p' AlbumData.xml 
<string>17804</string>

But there is a non-printable character after the 8:

$ sed -n '3497p' AlbumData.xml | od -c
0000000    <   s   t   r   i   n   g   >   1   7   8 001   0   4   <   /
0000020    s   t   r   i   n   g   >  \n     

The "001" is a character with a numeric value of 1, and that trips up any XML 
parser. I had a couple of reports of iPhoto inserting stray 000 characters. 
I've added a filter in Phoshare to remove those before passing the data to the 
XML parser.

Original comment by tsporkert on 3 Apr 2011 at 7:20