richq / folders2flickr

Upload files to flickr
Other
102 stars 38 forks source link

Existing set not always recognized, new one created instead #10

Closed richq closed 10 years ago

richq commented 10 years ago

From comments on issue #9:

it works fine for a while and then after a while it suddenly starts not to recognize previously created sets anymore and make a new one every 21 photos. After restarting the script, everything is back to normal again. Not quite sure what causes it.

michieldewit commented 10 years ago

I just took a look at the names of the sets that caused problems: I am fairly sure the problem was in diacritics. All of these names seemed to contain a ë or é. Perhaps an encoding issue: the set names returned by Flickr being in a different encoding than the ones keps by the script?

richq commented 10 years ago

That's a good clue, I'll try and reproduce this. I would hope everything is utf-8 (on disk, internally in the python strings, then on flickr) but it doesn't look that way. I'll try and reproduce the problem though.

Are the sets on flickr correct? I mean, they have the é in them, or do they get mis-coded before that?

michieldewit commented 10 years ago

The encoding is fine. Both on Flickr and on disk the é look ok. As an extra clue I just remembered that the original photos are on an NTFS partition on my Ubuntu file system. The file names get reported properly. I did some testing in PHP and the directory names seem to be reported in UTF-8. Guess the same would go for Python.

On Tue, Jan 21, 2014 at 8:01 AM, Richard Quirk notifications@github.comwrote:

That's a good clue, I'll try and reproduce this. I would hope everything is utf-8 (on disk, internally in the python strings, then on flickr) but it doesn't look that way. I'll try and reproduce the problem though.

Are the sets on flickr correct? I mean, they have the é in them, or do they get mis-coded before that?

— Reply to this email directly or view it on GitHubhttps://github.com/richq/folders2flickr/issues/10#issuecomment-32825613 .

richq commented 10 years ago

That's a good clue. I'll see if I can reproduce this soon.

richq commented 10 years ago

Do you think you could send me some snippets from the debug.log that is generated next time this goes wrong?

There are these logs that are interesting in tags2set.py:

logging.debug('tags2set: Found existing set %s' % setName) logging.debug("tags2set: create set %s with photo %s\n\n" % (setName, photos[0]))

In fact all of the tags2set stuff would be good when this gets it wrong. Redact the set names as you see fit so as not to send any personal information, though character encoding gaffs are the prime culprit here.

richq commented 10 years ago

Note to self: I think I know what the problem might be. To get the existing sets, we call flickr.photosets.getList. When there's a ñ or whatever, it will return that encoded somehow via http. I bet it returns it UTF8 URL encoded like "%C3%B1". Then when the tag2set.py compares the titles it does "ñ" == "%C3%B1", which is obviously false, so creates a new tag.

michieldewit commented 10 years ago

Your note sounds plausible

On Thu, Jan 23, 2014 at 11:10 AM, Richard Quirk notifications@github.comwrote:

Note to self: I think I know what the problem might be. To get the existing sets, we call flickr.photosets.getList. When there's a ñ or whatever, it will return that encoded somehow via http. I bet it returns it UTF8 URL encoded like "%C3%B1". Then when the tag2set.py compares the titles it does "ñ" == "%C3%B1", which is obviously false, so creates a new tag.

— Reply to this email directly or view it on GitHubhttps://github.com/richq/folders2flickr/issues/10#issuecomment-33111245 .

richq commented 10 years ago

In fact this was an "easy" bug - I created a directory with a utf-8 name and got this wonderful warning from Python 2.7:

f2flickr/tags2set.py:32: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

I should have a fix soon.

michieldewit commented 10 years ago

Nice work!

On Sat, Jan 25, 2014 at 7:45 PM, Richard Quirk notifications@github.comwrote:

Closed #10 https://github.com/richq/folders2flickr/issues/10 via f5bedb5https://github.com/richq/folders2flickr/commit/f5bedb5e6e40f4ee72be3b85760190b5e5dc8c39 .

— Reply to this email directly or view it on GitHubhttps://github.com/richq/folders2flickr/issues/10 .