Open MaH-s9UFaXPEKdLafMHkSGgWUAupTA1xj78S9C7 opened 1 month ago
Replies online, I don't have all the answers but tried to cover as much as I can.
- With all RenameExistingFilesWhenCustomFormatIsSelected added and it set to true/enabled, it deletes some of the files' extensions. While they were all viewable in my standard Windows photo app, I couldn't redate them, which was my temporary solution to my previous problem (Feature Request: Changing dates #515) with creation date and last accessed date (metadata on the files themselves) not being the same as last modified date (which is the correct date of the post being uploaded by the creator. Thanks also @melithine for adding the enhancement label, so that it is something that will be looked at. Especially since I had so many questions and you were so gracious in helping me with it all, so I got it up and running. For good measure I also run them through an image program, that can find fault with the files and despite them not even having the extension .jpg, which usually trips up that program, none gave errors and all were viewable. Another problem with including everything is that it skipped over some of the files. None of them were duplicates, both because I enabled DownloadDuplicatedMedia and also because I tried it with a completely new scrape.
I haven't seen that happen before regarding file extensions. If you can capture a debug log for that, we can look into it further.
- With RenameExistingFilesWhenCustomFormatIsSelected set to false, but the Custom Filename Formats still filled in, it still scraped everything with the Custom Filename Formats filters on. Reading my question, I now more closely notice the rename, so I am guessing this is not a bug, but something I want to ask for clarification? If I enable RenameExistingFilesWhenCustomFormatIsSelected will all of my previously downloaded media be renamed as it says? If so, I could see a workaround being to run the program twice. Once where it downloads everything and in the second go, where it renames what was just downloaded. Although if it downloaded everything without mistakes, that would be preferable of course. In due course.
There is a current bug with the rename option (#345), but i also think you're misunderstanding what that option is for. It's intended to handle when you move from the default naming format to a custom one, or after you have changed the custom format from one to another. If you do a fresh scrape with a custom format, the rename option won't come into play.
- Seems like there also are some coding problems with ". wrote br after one, like "br. It also wrote br after some emoticons.
- For some reasons, it also cuts of text from some post. I can't really figure out why, as it does it with different amounts of text, and also regardless of emoticons or not.
- Like the bug above, it also seems like the program doesn't like line shifting. It is fine with spaces, but if there is a line shift, then it doesn't necessarily include all the rest of the text. However at some times, it does allow for line shifting, but sadly it isn't consistent. Possibly more complicated to sort out than just so, but then again, I don't know coding. I hope you can fix it.
I'm not sure what you're referring to with these, but it sounds like you're using the text in the filenames, which can be problematic with what Windows allows. That's probably worth a separate discussion compared to other issues.
- I tried doing custom filename to paid posts, but it wouldn't insert anything at all, whether I had added extra folders for each post or just got them all in one big folder.
Not sure exactly what you mean here.
- Would it be possible to let {postedAt} be "last accessed date (including hour, minute and second of the day)" in the files' metadata themselves and {mediaCreatedAt} be "created date (including hour, minute and second of the day)" in the files' metadata themselves?
That sounds possible but would be a separate feature request.
- What is the difference between {postedAt}/{createdAt} and {mediaCreatedAt}? Might just be me that have been unlucky so to speak, but I haven't found any difference in the scrapes, I've tried so far. Could you possibly show/send me an example or tell me of one, preferably where I can try and scrape with both dates and see the difference? I realize, that there should be a difference, I just haven't seen one.
This primarily comes into play when a creator posts something multiple times. OF does deduplication, so they could upload and post something on Jan 1 2024 and then repost it July 1 2024. In that first case the two will match, but in the second case they would not. Or if the creator manually uploads it in advance and does a scheduled post, they would be different.
- Would it also be possible for the program to scrape posts, that are just texts as well as the intro text many models include in their profile below the header and then, when that text changes to not delete it and overwrite it with the new one, but simply give the old one or new one for that matter another name?
We don't handle just the text, ofdl is about the files. I don't really see that changing.
- What is Restricted Subscriptions?
If you are subscribed to a creator and block them from being able to message you.
- What does the users.db contain?
This just caches a mapping between the creator's username and their numerical id. https://sqlitebrowser.org/ can be used to browse the content of these files.
- Do new avatars and headers replace old avatars and headers or do they simply get downloaded and placed in each of their folders respectively along with the old ones? I'd definitely prefer it, if the program keeps the old ones.
That's a good question. @sim0n00ps ?
- When editing the config through the app for lack of the correct word instead of directly in the text editor, it takes minutes (more than three) for my computer to get back, so I can start the scraping. I can however close the window and the config is saved correctly with the changes I have made. Why is it, that it takes it so long to restart after editing the config inside the app itself and is it a mistake to shut it down prematurely, despite the config file looking alright in terms of changes being made? Is there another file, which do not get the changes, it is supposed to have?
I've never heard of this happening before. But you should be able to edit it with Notepad or the like when ofdl is not running.
- Same as the question above, but instead when I am done scraping it takes some time for it to be ready for another scrape. Can I safely close it immediately when it is done scraping or is it necessary for it to be ready to do another scrape to avoid messing up metadata files and other stuff? For instance will it mess up things, if not the app itself, then some of the standard Windows programs or antivirus, so that I won't be able to eject my external HDD, when done scraping?
What do you mean by it takes a while to be ready for another scrape?
- What exactly is the naming scheme for "Custom Filename Formats"? I see, both spaces being used and underscores as well as {} and without {}? Also is it possible to enter my own naming scheme in one of the entries, like if I wanted them to be named 1, 2, 3 and so forth?
https://sim0n00ps.github.io/OF-DL/docs/config/custom-filename-formats
- Would you recommend having the program on an external HDD and the same place as the files are being downloaded to or is it best to have the program itself on an internal HDD and the files then on an external HDD?
I don't think that's going to make a big difference. The .exe is pretty small and it won't have a lot of io requests compared to the actual media files.
- Why is it that {rawText} is only for PostFileNameFormat and not the other three as well? I tried it and it was quite weird. Just some posts {rawText} got the full text as it was made and on other posts just {text} got the full text. Why can that be? What is the difference between them and the use cases for each? Preferably if you don't mind being as detailed as possible as I really want to get all the text and in as a true format to the original posted as possible. If need or want be, I'll happily share the examples I've made, if you can provide me with a secure way to share it. Not showing my user account and password, I mean.
I can't really speak to this, maybe @sim0n00ps can elaborate here.
- Related to above, would it be possible to make the program insert the text directly to the comments section of the files themselves? I know of that function from some other programs and it works quite well. Admittedly you cannot easy see what the comment is/was, but at least you don't have any problems with emojis and the like.
I think that would be difficult to add, since we would have to download it and then add it to the metadata. Text really isn't the focus of the tool.
- I see different issues being raised about the quality of the downloads. As such, may I ask for some clarification on what quality the program downloads the file in? Is it the source quality or does it compress or upscale in any way? Personally I prefer source quality.
It is source, yes. There's a feature request to provide more options (#243), but right now it's only source quality.
- Would it be possible to also include some alternate rules for what or how it gets scraped in CreatorConfigs? For instance some only send one picture or video in a message, while other send multiple? It would either be nice to have it, so that it was possible to turn it on/off for some, but not for others, if there should be folder sorting or not or alternatively, if it could natively see, if there is more than one picture/video, if it will then sort them into a grouped folder or not with the naming scheme, which exists right now, which I quite like, when the folders are done.
That's a very reasonable feature request to expand the scope of that config. Right now it's only for the file formats, but i personally would like to be able to say which creators i want images enabled for while most I just care about the videos.
- Also, is it possible to get the folders, when there they are downloaded by folder to be sorted with a different way to show dates and minutes, something like yyyy-mm-dd and then h:m:s instead of h-m-s as it is now, so they are easier to differentiate between. I get if that might not be on the top of the list.
I can see extending the options around date handling as a new feature request.
I love the fact, that I can now scrape without scraping ads. That was so needed.
That you now also can download streams now is incredible along with stories and highlights and also choose between only them individually is amazing, especially since stories are small, quick to scrape, if you could only choose stories, which you can with this, but I couldn't before. As the latter two goes, would anyone mind telling me what the difference is? The time they stay up? Highlights being longer and stories being 24h, but essentially the same thing other than that?
Again, thank you so much for making this program @sim0n00ps and everyone else who helps to keep it going.
I'm not sure what the difference is, to be honest, I've never really paid attention to those myself.
Hey @melithine Thank you so much for your quick and elaborate response. I'm not that good at quoting, actually don't know how to do it, so if' you'll teach me, that'd be great. For now though, I'll use the numbers and then after I've learned it, I could edit my post, so it'd be easier to look through in the future.
If you look at the top of my comment, you'll see ...
. Click that and then Quote reply.
Replies online, I don't have all the answers but tried to cover as much as I can.
- With all RenameExistingFilesWhenCustomFormatIsSelected added and it set to true/enabled, it deletes some of the files' extensions. While they were all viewable in my standard Windows photo app, I couldn't redate them, which was my temporary solution to my previous problem (Feature Request: Changing dates #515) with creation date and last accessed date (metadata on the files themselves) not being the same as last modified date (which is the correct date of the post being uploaded by the creator. Thanks also @melithine for adding the enhancement label, so that it is something that will be looked at. Especially since I had so many questions and you were so gracious in helping me with it all, so I got it up and running. For good measure I also run them through an image program, that can find fault with the files and despite them not even having the extension .jpg, which usually trips up that program, none gave errors and all were viewable. Another problem with including everything is that it skipped over some of the files. None of them were duplicates, both because I enabled DownloadDuplicatedMedia and also because I tried it with a completely new scrape.
I haven't seen that happen before regarding file extensions. If you can capture a debug log for that, we can look into it further.
If you can tell me, how to do a debug log, I can certainly do that. Also, if you could give me some place to send some files, possibly a MEGA container or the like, then I could send you my examples there.
- With RenameExistingFilesWhenCustomFormatIsSelected set to false, but the Custom Filename Formats still filled in, it still scraped everything with the Custom Filename Formats filters on. Reading my question, I now more closely notice the rename, so I am guessing this is not a bug, but something I want to ask for clarification? If I enable RenameExistingFilesWhenCustomFormatIsSelected will all of my previously downloaded media be renamed as it says? If so, I could see a workaround being to run the program twice. Once where it downloads everything and in the second go, where it renames what was just downloaded. Although if it downloaded everything without mistakes, that would be preferable of course. In due course.
There is a current bug with the rename option (#345), but i also think you're misunderstanding what that option is for. It's intended to handle when you move from the default naming format to a custom one, or after you have changed the custom format from one to another. If you do a fresh scrape with a custom format, the rename option won't come into play.
Yeah, I totally misunderstood the custom format renaming thing. That being said, I've tried doing it the way you say as well and it didn't change what had already been downloaded. Maybe a bug? Would definitely be nice to have them change, if/now when I download some with the text bug and then get them fixed down the road, instead of starting with the normal file naming regime.
- Seems like there also are some coding problems with ". wrote br after one, like "br. It also wrote br after some emoticons.
- For some reasons, it also cuts of text from some post. I can't really figure out why, as it does it with different amounts of text, and also regardless of emoticons or not.
- Like the bug above, it also seems like the program doesn't like line shifting. It is fine with spaces, but if there is a line shift, then it doesn't necessarily include all the rest of the text. However at some times, it does allow for line shifting, but sadly it isn't consistent. Possibly more complicated to sort out than just so, but then again, I don't know coding. I hope you can fix it.
I'm not sure what you're referring to with these, but it sounds like you're using the text in the filenames, which can be problematic with what Windows allows. That's probably worth a separate discussion compared to other issues.
Ah ok. Should I open a new one just for 3-5 or? I'm a bit unsure about what you mean about, that I am using the text in the filenames, as it is just how they are written to me in Windows, when I download the file, but not something that I do myself manually. If there is a way, that I can send the files to you, I would love for you to check it out.
- I tried doing custom filename to paid posts, but it wouldn't insert anything at all, whether I had added extra folders for each post or just got them all in one big folder.
Not sure exactly what you mean here.
Just when I had set the custom file name to change the names of the files, it didn't change any of them, if they were paid posts. It did do it with normal posts, messages and paid messages, but it didn't do it with the paid posts no matter how I tried it. Again, I'll happily send examples in some way.
- Would it be possible to let {postedAt} be "last accessed date (including hour, minute and second of the day)" in the files' metadata themselves and {mediaCreatedAt} be "created date (including hour, minute and second of the day)" in the files' metadata themselves?
That sounds possible but would be a separate feature request.
Ok. Should I open a new one for this as well or? Just to keep it all more cleanly.
- What is the difference between {postedAt}/{createdAt} and {mediaCreatedAt}? Might just be me that have been unlucky so to speak, but I haven't found any difference in the scrapes, I've tried so far. Could you possibly show/send me an example or tell me of one, preferably where I can try and scrape with both dates and see the difference? I realize, that there should be a difference, I just haven't seen one.
This primarily comes into play when a creator posts something multiple times. OF does deduplication, so they could upload and post something on Jan 1 2024 and then repost it July 1 2024. In that first case the two will match, but in the second case they would not. Or if the creator manually uploads it in advance and does a scheduled post, they would be different.
Ah ok, so it is actually able to capture when they are trying to scam with old content? "Kind of" makes this program be a need-to-have instead of "just" a nice-to-have. I guess I better check when one of those who often makes repeats give another free trial, so I can check it out. Feel free to recommend any, if you know one for sure that will trigger that function.
- Would it also be possible for the program to scrape posts, that are just texts as well as the intro text many models include in their profile below the header and then, when that text changes to not delete it and overwrite it with the new one, but simply give the old one or new one for that matter another name?
We don't handle just the text, ofdl is about the files. I don't really see that changing.
Ah ok damn. Just some of the posts where it is just texts are real nice to have, like if they are explaining something about a part of their content or other. Guess I better plead with them to attach a photo to their text then.
- What is Restricted Subscriptions?
If you are subscribed to a creator and block them from being able to message you.
- What does the users.db contain?
This just caches a mapping between the creator's username and their numerical id. https://sqlitebrowser.org/ can be used to browse the content of these files.
Ah cool. Thanks. As for # 11 I was worried that it was another way to metadata the content, so even if I deleted a metadata folder to force the program to redownload, that it wouldn't do it, but really, really nice that there is a way to track if/when they change their name. A bit of a follow-up, but if they change name from say aaabbb to aaabbc, will the content from aaabbc automatically get stored in the original aaabbb folder or will the old content be stored in a new folder under the new name or? Thanks also for the link to a program I can use to check those databases. Does it only work with databases created with this program or just a general .db tool? I'm guessing the latter, but as I am not that coding savvy, I rather ask and be safe than sorry.
- Do new avatars and headers replace old avatars and headers or do they simply get downloaded and placed in each of their folders respectively along with the old ones? I'd definitely prefer it, if the program keeps the old ones.
That's a good question. @sim0n00ps ?
- When editing the config through the app for lack of the correct word instead of directly in the text editor, it takes minutes (more than three) for my computer to get back, so I can start the scraping. I can however close the window and the config is saved correctly with the changes I have made. Why is it, that it takes it so long to restart after editing the config inside the app itself and is it a mistake to shut it down prematurely, despite the config file looking alright in terms of changes being made? Is there another file, which do not get the changes, it is supposed to have?
I've never heard of this happening before. But you should be able to edit it with Notepad or the like when ofdl is not running.
- Same as the question above, but instead when I am done scraping it takes some time for it to be ready for another scrape. Can I safely close it immediately when it is done scraping or is it necessary for it to be ready to do another scrape to avoid messing up metadata files and other stuff? For instance will it mess up things, if not the app itself, then some of the standard Windows programs or antivirus, so that I won't be able to eject my external HDD, when done scraping?
What do you mean by it takes a while to be ready for another scrape?
Might just be my computer that is a bit slow and in general, it isn't really an issue for me and I don't see it being one. It was only when I tested it now, that it bothered me a bit, since it took longer to test. Although I did wonder, if I could just close it, before it was saying it was ready for another scrape, but after saying it was done, if that would alter anything in a bad way. And yeah, just takes some minutes before it is ready for another scrape after finishing one and by one I mean one pass of everything I have chosen, not just one user at a time. If we are talking about from user to user, it works fine. It is only at the very end, after it has said, that it is done, gives the time it took and reloads so to speak. As long as it doesn't mess anything up when downloading or when I end the program after I've changed something in the config via the program and not via th .txt and everything is good, I'm fine. Worst case scenario, I let it run the 3 to 5 minutes it takes to be ready for another command.
- What exactly is the naming scheme for "Custom Filename Formats"? I see, both spaces being used and underscores as well as {} and without {}? Also is it possible to enter my own naming scheme in one of the entries, like if I wanted them to be named 1, 2, 3 and so forth?
https://sim0n00ps.github.io/OF-DL/docs/config/custom-filename-formats
Ah yeah, I looked there, but was still a bit unsure, since it didn't mention to put it in those special brackets. Also, is it possible to name them like 1, 2, 3, 4.... or something like that?
- Would you recommend having the program on an external HDD and the same place as the files are being downloaded to or is it best to have the program itself on an internal HDD and the files then on an external HDD?
I don't think that's going to make a big difference. The .exe is pretty small and it won't have a lot of io requests compared to the actual media files.
Cool. Thanks for the info on that. I definitely want to keep it as organized and functioning as possible.
- Why is it that {rawText} is only for PostFileNameFormat and not the other three as well? I tried it and it was quite weird. Just some posts {rawText} got the full text as it was made and on other posts just {text} got the full text. Why can that be? What is the difference between them and the use cases for each? Preferably if you don't mind being as detailed as possible as I really want to get all the text and in as a true format to the original posted as possible. If need or want be, I'll happily share the examples I've made, if you can provide me with a secure way to share it. Not showing my user account and password, I mean.
I can't really speak to this, maybe @sim0n00ps can elaborate here.
Yeah, what I found really odd was especially when it came to posts, as both {rawText} and {text} is used. What puzzled me more is the big difference it seemed to have on some emoticons, if not all emoticons. A lot when it was just {text} gave me that br thing after it. Other gave me > and others <. With a few I got [] which I just tried to paste in here, but GitHub doeesn't have such problems as windows so apparently [] is 🥹. Well now I know.
- Related to above, would it be possible to make the program insert the text directly to the comments section of the files themselves? I know of that function from some other programs and it works quite well. Admittedly you cannot easy see what the comment is/was, but at least you don't have any problems with emojis and the like.
I think that would be difficult to add, since we would have to download it and then add it to the metadata. Text really isn't the focus of the tool.
Ah ok. A shame, but that being said, I'd still rather have the files with the text directly in them than no files at all, so no biggie. Just missing to get all the text, but I'm looking forward to when that is fixed. At any rate, it is just nice to have the files and not rely on an account never being closed or an internet connection always working to view stuff I've paid for.
- I see different issues being raised about the quality of the downloads. As such, may I ask for some clarification on what quality the program downloads the file in? Is it the source quality or does it compress or upscale in any way? Personally I prefer source quality.
It is source, yes. There's a feature request to provide more options (#243), but right now it's only source quality.
Ok, that is great. Thanks for the clarification on that. May I ask, what are the benefits of getting the other formats? I know there are some, else it wouldn't be asked for of course, so I am merely curious.
- Would it be possible to also include some alternate rules for what or how it gets scraped in CreatorConfigs? For instance some only send one picture or video in a message, while other send multiple? It would either be nice to have it, so that it was possible to turn it on/off for some, but not for others, if there should be folder sorting or not or alternatively, if it could natively see, if there is more than one picture/video, if it will then sort them into a grouped folder or not with the naming scheme, which exists right now, which I quite like, when the folders are done.
That's a very reasonable feature request to expand the scope of that config. Right now it's only for the file formats, but i personally would like to be able to say which creators i want images enabled for while most I just care about the videos.
Well, that is actually expanding on what I thought could be expanded upon, but yeah that would definitely be great too. I was just thinking it for like say there are three posts (paid/&free) or messages (paid/&free) and if there were just one photo or video with one of them, it wouldn't make a folder with it, but if there were multiple with the two others, then they would each get theirs. Possibly without make new photo/video folder inside of that, even if there were both photos and videos. At least for messages and especially paid messages and possibly paid posts too, it could be nice to have one set together in the same folder, instead of one set together in two subfolders belonging to the same folder.
- Also, is it possible to get the folders, when there they are downloaded by folder to be sorted with a different way to show dates and minutes, something like yyyy-mm-dd and then h:m:s instead of h-m-s as it is now, so they are easier to differentiate between. I get if that might not be on the top of the list.
I can see extending the options around date handling as a new feature request.
Ah ok. Should I make a new post with that request, as well as for the two previous ones I mentioned?
I love the fact, that I can now scrape without scraping ads. That was so needed. That you now also can download streams now is incredible along with stories and highlights and also choose between only them individually is amazing, especially since stories are small, quick to scrape, if you could only choose stories, which you can with this, but I couldn't before. As the latter two goes, would anyone mind telling me what the difference is? The time they stay up? Highlights being longer and stories being 24h, but essentially the same thing other than that? Again, thank you so much for making this program @sim0n00ps and everyone else who helps to keep it going.
I'm not sure what the difference is, to be honest, I've never really paid attention to those myself.
Ah, hehe fair enough. A note about streams, would it be possible to get them in their own folder? If not, video folder it is then and it works good too. Thank you again so much.
Thanks for the help again @melithine Hopefully I did this correctly, as I also now see, that the way I tried to edit it with numbers messed up, as Github changed the numbers I had made and I didn't realize that, when I posted my original answer.
I'll answer some @melithine asked me to comment on:
- Do new avatars and headers replace old avatars and headers or do they simply get downloaded and placed in each of their folders respectively along with the old ones? I'd definitely prefer it, if the program keeps the old ones.
Avatars and Headers have the filenames "username dd-mm-yyyy.jpg". I made it use MD5 hashing to determine if the file is new or is already in the folder. It used to just write them as header.jpg and avatar.jpg but obviously that meant that they would be overwritten if they were ever changed. So yes the old ones are kept.
- Why is it that {rawText} is only for PostFileNameFormat and not the other three as well? I tried it and it was quite weird. Just some posts {rawText} got the full text as it was made and on other posts just {text} got the full text. Why can that be? What is the difference between them and the use cases for each? Preferably if you don't mind being as detailed as possible as I really want to get all the text and in as a true format to the original posted as possible. If need or want be, I'll happily share the examples I've made, if you can provide me with a secure way to share it. Not showing my user account and password, I mean.
The rawText property only gets returned from the API on Posts. These fall under Paid Posts, Posts, Archived Posts and Streams as they are all technically Posts as they use the same JSON structure. As far as I can see it has sort of been phased out as it doesn't seem to be returned anymore. I believe the only difference was that rawText contained HTML whereas the text was just raw text with no HTML. However I believe that text is the only one returned now and that now contains HTML. Example being <p>i'm online ❤️</p>
. So for everything you should probably just use "text".
That you now also can download streams now is incredible along with stories and highlights and also choose between only them individually is amazing, especially since stories are small, quick to scrape, if you could only choose stories, which you can with this, but I couldn't before. As the latter two goes, would anyone mind telling me what the difference is? The time they stay up? Highlights being longer and stories being 24h, but essentially the same thing other than that?
If you are aware of how Instagram's stories and highlights work, OF's is the exact same. Stories are available for 1 day and will disappear. Highlights are basically collections of stories which have expired and Highlights don't expire unless they are deleted then they aren't visible anymore. Hopefully that makes sense.
First of, thanks again for making this amazing program and to everyone else who contributes. I really, really love it. Even more so now, that I find I can also get the text, which I really felt was missing from the old scraper I used.
I have some questions, I hope you won't mind answering/helping me with and sadly, I have found a few bugs. All of them were found using fresh databases and metadatabases when scraping anew:
I love the fact, that I can now scrape without scraping ads. That was so needed.
That you now also can download streams now is incredible along with stories and highlights and also choose between only them individually is amazing, especially since stories are small, quick to scrape, if you could only choose stories, which you can with this, but I couldn't before. As the latter two goes, would anyone mind telling me what the difference is? The time they stay up? Highlights being longer and stories being 24h, but essentially the same thing other than that?
Again, thank you so much for making this program @sim0n00ps and everyone else who helps to keep it going.