Open johanneszab opened 6 years ago
Metapost for #188, #144, #119, #85, #126.
I really like this idea! I myself manually rename all the files and I like the idea that it can be automated in the program itself.
A few thoughts. In sum, the text consisting of the original name (tumblr_ *), post captions, tags, reblog names can exceed 259 characters. For some programs such files will be unavailable, however I have an image viewer that freely opens such files. I would like users to be able to enable or disable the restriction on the number of characters in the file name, so that everyone will be more comfortable, not everyone can have programs that can open files with long names and also some (especially me) may need extra information in file names.
Also post captions and tags can contain characters that are not allowed in file names such as \/:*?"<>|+, In such cases I would like them to be replaced with a space. If they are not replaced by a space and such characters are omitted, then the words separated by forbidden symbols can merge into single words.
There may also be several reblogs, many blogs are reblogged each other, whole chains of reblogs. In such cases, I would like the program to specify all of them in the parameter %r, not only one, but all. In the metadata, the post captions contain all the addresses of the reblogged blogs.
In addition, I would like to have another parameter %o (owner), which indicates the name of the downloaded blog.
That's all. With impatience I will wait for this function, for me it is very desirable! Thank you very much!
A few more suggestions. If this is possible, then it would be desirable that the files from the http://www.tumblr.com/search/keywords and http://www.tumblr.com/tagged/keywords could be renamed during downloading as well. Also, files from liked/by if some users want it. Also, many users already have downloaded blogs and hence many files already lie on the disk unnamed. It would be desirable that the program could rename them without having to delete blogs and redownload them to have renamed files.
Any progress on this? I actually worked on this a ton last night, without even realizing it was a request and or even talked about. I had started passing all of the wanted data along so that I could implement it in the "On Download" event.
This is definitely the number one feature I want most, thus started helping out where I could.
Hi!
Thanks for showing interest in this project and thinking about participating! It's really much welcomed.
No, not really any progress. I remember that I once started implementing something, but I stopped after realizing that I had to implement a separate rename pattern for each crawler/downloader independently because some of them lack some data (e.g. tags) that others provide. This also meant I'd had to wire up all the different cases to the GUI. That was too much work for me for implementing it over a weekend.
I'll attach my progress (source code from some older commit), but you can come up with any other solution, even if it only covers some downloader/crawler (e.g. regular tumblr blogs).
Thanks again!
Cool, sounds good! Thanks!
I will keep my progress posted, and let you know what I come up with!
Good idea! Thank u for ur continued support for the app @johanneszab
hi guys, my name is Chris from Germany.
I think the main goal is to make image filenames more readable !? right?
if you agree we take the blog https://bestcelebritylegs.tumblr.com/ for demo... it has all title and tags... and it also has multiple image posts with title and tags...
I would suggest the option to rename the image filename(s) with title or tag(s) naming... but it should also have the image ID what is "tumblr_[imageID]_raw"
users should be able to choose what should be changed for their every special needs...
when I was talking about title it is called photo caption... it is also important not to use url slug... it is often cut off...
I tried to implement this within my free time, but found myself running into issues constantly. I guess I am not the developer I thought I was... That or since it's not my original code, it's hard for me to grasp my head around without fully investing time to learn and act as if the code is my own.
Maybe a better option would be to log the media uuid, and then have a script for renaming the files after sync is complete. That way it doesn't slow the scraping and downloading down. And uses the CPU power after it disconnects from the network.
On Tue, Jul 31, 2018, 8:15 AM krizzzzzz notifications@github.com wrote:
hi guys, my name is Chris from Germany.
I think the main goal is to make image filenames more readable !? right?
if you agree we take the blog https://bestcelebritylegs.tumblr.com/ for demo... it has all title and tags... and it also has multiple image posts with title and tags...
I would suggest the option to rename the image filename(s) with title or tag(s) naming... but it should also have the image ID what is "tumblr_[imageID]_raw"
users should be able to choose what should be changed for their every special needs...
when I was talking about title it is called photo caption... it is also important not to use url slug... it is often cut off...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/johanneszab/TumblThree/issues/193#issuecomment-409199528, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxxkIJ0IywY-4V9p921PaMJ7BUpnD1eks5uMEpXgaJpZM4RY8Nr .
Well, if someone wants to be able to rename files now and has the program called TextPipe, then here is a file-pattern for converting image metadata to a file-list (with tags, post captions and reblogs). Only after converting it will still have links and id at the beginning of the lines, they need to be removed and a file-list for renaming is ready. It remains to bring the file-list in the file and in the folder in order to weed out the missing files and through the Total Commander you can rename the files using this file-list. Here is the archive. You can also open file inside archive through the Notepad and pull the necessary regular expressions out of it to apply them through another program if you do not have TextPipe. TextPipe imagemetadata regex replaces.zip
Also I can try to move all the regular expressions to the sed and make a script and probably additionally attach the commands for dropping unnecessary and missing files and also possibly the renaming command itself. Using only utilities such as sed, awk, sort and others (you can get them on Windows through the Cygwin installation, after which you can access them through the standard CLI.). If all goes well, I'll attach the batch script. Only I do not promise that the script will be soon, I do not have many free time and I do not fully expert in the commands of the CLI.
I would suggest the option to rename the image filename(s) with title or tag(s) naming... but it should also have the image ID what is "tumblr_[imageID]_raw"
users should be able to choose what should be changed for their every special needs...
when I was talking about title it is called photo caption... it is also important not to use url slug... it is often cut off...
Well, It sounds easy to do, and it certainly is easy to implement in theory, but the problem lies in the inconsistency of the available data, which is different for each post type (photo vs video post). And that there are already 5 different Tumblr crawler implemented with different available data for each.
What you're suggesting in your 3 sentences are already 16 possible options. Then, what happens if a photo has no title, but the user wants $title.png
? Or no tags or the suggested pattern by the user is $date_$tag.png
and suddenly two photos are renamed to the same file because for two photo posts on the same date there are no tags. External photos for instance don't even have the tumblr ID/hash. Then photo posts offer more options than videos. The Tumblr search crawler currently don't grab any information except for the photo/video url.
In essence you end up with having to wire all that up to the GUI. For each post type with a different scheme with different options, for each crawler. And then you'll have to define defaults/fallbacks for things that shouldn't happen like the $title.png
with no title. Or colons in the file name.
Like I said, it's doable, but having this customizable and fail-safe will possible notifications for the user will take a lot of time. Then again, its probably way easier if the user does it himself like Turanchuk with a script by grabbing the information he needs from the meta data/crawler data that TumblThree provides.
@johanneszab Correct, I agree completely. Thus, my above suggestion for a separate process after the crawling and downloads: -- to then to rename said/mentioned files. (that has data in the metaX)
Another way to say this, is - I am suggesting adding an in-house solution for zipping-thru the folder tree of blogs downloaded, and renaming the files using the metadata that was ripped.
-- This way you won't have to keep track or embed/code the said process within the crawler/downloader.
I moved all regular expressions to script. Now it's enough to set up on Windows sed, grep, cat, paste and tr utilities from Cygwin or somewhere else, add them to PATH environment variable and you can run the script for converting image metadata to file-list with post captions, tags and reblogs. You need to put the images.txt file in the root of C: \ disk and after running the script in this file there will be a file-list for renaming. The renaming command itself has not added yet, since I still need to figure out how can clean out the superfluous files from the list, if they are repeated or missing in one of the file list/folder. I'll add more commands as soon as I figure it out. rename.zip
now talking about nearly the most important thing - to make use the most simple way possible of all content (for example images) is being skipped and ignored?
people/users who want make use of that application want a complete solution to be able to work with the results/downloads... these people are not developers or advanced used...
The function I really need is the option to save all reblogged media in subfolders named after the user who made the upload.
I follow many artists, and many of them add picture variants by reblogging their original posts or in a reblog chain when a follower makes a suggestion.
If such an option is not possible to implement at the moment, then renaming the files with [uploader name][separator character of choice][tumblr filename].[extension] is a decent enough workaround.
Okay, since so many people keep asking this, the next thing I'll implement is a file renamer. Since I don't have that much time for the next two months, I'll lay out how I think we should do this. Thus, if anyone wants to implement something, feel free to do it:
I think we should inject it into the AbstractDownloader and rename the files right before we download them. Or we implement it into the TumblrPosts classes and add a rename method for all of the different post classes. Then all we have to do is to call post.rename().
Injecting/renaming it right before we download them and not during the crawl would have the benefit that we can compare the original found file name against the databases. Thus, all databases stay valid, and changing the rename pattern doesn't redownload the same file again.
We'll have to send the whole post or all eventually useful information down the IProducerConsumer queue, thus we'll have to modify the TumblrPosts so that it contains reblog, tag, date, title, etc. information. Everything someone might need.
Then we could add a new text box into the details view where we allow some predefined pattern for file renaming. For example something like:
would create outputs like
so that %d stands for the postdate, %t for all tags, %h for the tumblr hash, %r for reblog, %p for the post title, %k for the keyblog key. Or similar.
I'm not sure if it's possible to add a counter for all downloaded files that starts at the first post of the blog and rises, beginning with zero, but the tumblr post id is unique and rises and is mapped to each post. Thus, it will not start at zero, but at some offset and has holes, but you can eventually sort the files using the post id. It think having a counter that starts at zero is hard to do with all the async code, and if some file is inaccessible, or someone only wants to count pictures, but not videos, it's getting really messy to implement this.
If anyone has any ideas of how to do this better, feel free to comment.