Closed richq closed 10 years ago
I just took a look at the names of the sets that caused problems: I am fairly sure the problem was in diacritics. All of these names seemed to contain a ë or é. Perhaps an encoding issue: the set names returned by Flickr being in a different encoding than the ones keps by the script?
That's a good clue, I'll try and reproduce this. I would hope everything is utf-8 (on disk, internally in the python strings, then on flickr) but it doesn't look that way. I'll try and reproduce the problem though.
Are the sets on flickr correct? I mean, they have the é in them, or do they get mis-coded before that?
The encoding is fine. Both on Flickr and on disk the é look ok. As an extra clue I just remembered that the original photos are on an NTFS partition on my Ubuntu file system. The file names get reported properly. I did some testing in PHP and the directory names seem to be reported in UTF-8. Guess the same would go for Python.
On Tue, Jan 21, 2014 at 8:01 AM, Richard Quirk notifications@github.comwrote:
That's a good clue, I'll try and reproduce this. I would hope everything is utf-8 (on disk, internally in the python strings, then on flickr) but it doesn't look that way. I'll try and reproduce the problem though.
Are the sets on flickr correct? I mean, they have the é in them, or do they get mis-coded before that?
— Reply to this email directly or view it on GitHubhttps://github.com/richq/folders2flickr/issues/10#issuecomment-32825613 .
That's a good clue. I'll see if I can reproduce this soon.
Do you think you could send me some snippets from the debug.log that is generated next time this goes wrong?
There are these logs that are interesting in tags2set.py:
logging.debug('tags2set: Found existing set %s' % setName) logging.debug("tags2set: create set %s with photo %s\n\n" % (setName, photos[0]))
In fact all of the tags2set stuff would be good when this gets it wrong. Redact the set names as you see fit so as not to send any personal information, though character encoding gaffs are the prime culprit here.
Note to self: I think I know what the problem might be. To get the existing sets, we call flickr.photosets.getList. When there's a ñ or whatever, it will return that encoded somehow via http. I bet it returns it UTF8 URL encoded like "%C3%B1". Then when the tag2set.py compares the titles it does "ñ" == "%C3%B1", which is obviously false, so creates a new tag.
Your note sounds plausible
On Thu, Jan 23, 2014 at 11:10 AM, Richard Quirk notifications@github.comwrote:
Note to self: I think I know what the problem might be. To get the existing sets, we call flickr.photosets.getList. When there's a ñ or whatever, it will return that encoded somehow via http. I bet it returns it UTF8 URL encoded like "%C3%B1". Then when the tag2set.py compares the titles it does "ñ" == "%C3%B1", which is obviously false, so creates a new tag.
— Reply to this email directly or view it on GitHubhttps://github.com/richq/folders2flickr/issues/10#issuecomment-33111245 .
In fact this was an "easy" bug - I created a directory with a utf-8 name and got this wonderful warning from Python 2.7:
f2flickr/tags2set.py:32: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
I should have a fix soon.
Nice work!
On Sat, Jan 25, 2014 at 7:45 PM, Richard Quirk notifications@github.comwrote:
Closed #10 https://github.com/richq/folders2flickr/issues/10 via f5bedb5https://github.com/richq/folders2flickr/commit/f5bedb5e6e40f4ee72be3b85760190b5e5dc8c39 .
— Reply to this email directly or view it on GitHubhttps://github.com/richq/folders2flickr/issues/10 .
From comments on issue #9:
it works fine for a while and then after a while it suddenly starts not to recognize previously created sets anymore and make a new one every 21 photos. After restarting the script, everything is back to normal again. Not quite sure what causes it.