johanneszab / TumblThree

A Tumblr Blog Backup Application
https://www.jzab.de/content/tumblthree
MIT License
922 stars 133 forks source link

Image metadata from side blogs are downloaded incorrectly #124

Closed Taranchuk closed 7 years ago

Taranchuk commented 7 years ago

I just found that on some blogs (apparently, on side blogs), the image metadata are downloaded a little incorrectly. Here is a piece of metadata:

Post ID: 164210474784, Date: 2017-08-15 10:39:27 GMT Url with slug: Reblog key: hrE2mmxK Reblog Url: Reblog Name: Photo Url: h, t, t, p, s, :, /, /, 6, 8, ., m, e, d, i, a, ., t, u, m, b, l, r, ., c, o, m, /, 7, 8, f, f, d, 7, 8, 9, 2, d, 8, e, 8, f, b, 0, 3, 1, a, 4, b, 7, 4, d, 5, 2, 7, 2, 9, 3, 8, 2, /, t, u, m, b, l, r, , o, u, q, 2, x, r, M, A, L, L, 1, t, v, i, k, c, 4, o, 1, , 5, 0, 0, ., j, p, g Photo Caption: Tags:

Post ID: 164210450554, Date: 2017-08-15 10:37:58 GMT Url with slug: Reblog key: CACaWVEz Reblog Url: https://mrsammalone.tumblr.com/post/164210113774 Reblog Name: mrsammalone Photo Url: h, t, t, p, s, :, /, /, 6, 8, ., m, e, d, i, a, ., t, u, m, b, l, r, ., c, o, m, /, 7, 8, 6, e, 4, 5, 8, c, 3, 7, 7, 8, 5, c, 2, 7, c, a, 5, a, f, 6, 3, a, 7, 9, 1, 0, 5, f, e, e, /, t, u, m, b, l, r, , o, u, q, 1, z, j, c, A, N, O, 1, v, g, t, 4, h, y, o, 1, , 5, 0, 0, ., j, p, g Photo Caption: Tags:

It is seen that the links are separated by spaces and commas and the entries with Url with slug are empty. I'm using version 1.0.8.8. Also, I saw the same bug in the SVC versions, but on all blogs, not just on side blogs. Here are a few side blogs for testing. http://11binfantry03.tumblr.com http://13solodog.tumblr.com http://kinginlv.tumblr.com

upd. I just realized that both the photo captions and tags are also empty in the metadata files, although they should not be so on posts where there are captions and tags.

johanneszab commented 7 years ago

Thanks, fixed.

Taranchuk commented 7 years ago

I tried the latest version of the SVC version, in the metadata files there is still no post captions, unfortunately. I would like to switch to the SVC version, since here is the best speed of downloading metadata files, but here the metadata files are less useful than the metadata files from the default version.

johanneszab commented 7 years ago

What from this example data is the proper post caption you're looking for? Seems to be different, but I haven't looked deeply into any of the data there as I've written in the release notes. I just noticed the larger amount of data during the implementation but I personally don't really care.

Is "source_title" the correct one? What else would you need or think is useful?

Taranchuk commented 7 years ago

Here there are different types of post captions. This "body": and "caption": and "post_html":. That's all I found. I think can download them all, because they all do not meet in the same post, only one type of these.

Taranchuk commented 7 years ago

In the example data, there are only 100 posts. And the blog seems to consist of text posts, but not images. Can you give another example data from another blog that publishes only images and a desirable more than thousands of posts? For example, these blogs http://primalbehaviors.tumblr.com or http://420powell.tumblr.com. These blogs have a lot of post captions, so it's easiest to find the types of post captions and make sure that nothing is missed or if there are others, then I could point to them.

johanneszab commented 7 years ago

In the example data, there are only 100 posts. And the blog seems to consist of text posts, but not images.

Not true. You could simply have searched for "photo" ...