taux1c / onlyfans-scraper

A tool that allows you to print to file all content you are subscribed to on onlyfans including content you have unlocked or has been sent to you in messages.
MIT License
289 stars 34 forks source link

missing files? #28

Closed mschairspam closed 1 year ago

mschairspam commented 1 year ago

Describe the bug I successfully scraped a single creator's posts with seemingly no problems, except the numbers don't add up

output from the scrape:

? What would you like to do? Download content from a user
? Choose one of the following options: Enter a username
? Enter a model's username: X
? Which area(s) would you like to scrape? (Press ENTER to continue) ['All']
Name: X | Username: XX | ID: XXX | Joined: XXXX
- 6490 posts
 -- 6053 photos
 -- 1084 videos
 -- 0 audios
- 5 archived posts
 + Getting pinned media...                                
 + Getting timeline media...                                
 + Getting archived media...                                
 + Getting highlights...                                
 + Getting messages...                                
Progress: (6016 photos, 155 videos, 0 skipped || 3.8 GB): 100%

downloaded folder only has 938 items at 1.29GB

To Reproduce Steps to reproduce the behavior:

  1. start the script as usual
  2. select a model by username (if you're curious, it was sukisucchouse69, NSFW)
  3. scrape finishes with no error
  4. view resultant folder

Expected behavior a resultant folder with 6016 photos, 155 videos at around 3.8GB

Desktop (please complete the following information):

Additional context My theory is that there were a lot of duplicates that were filtered out automatically... if this is the case, I may have missed the specifics on the FAQ? ... still, 6k files down to 900 yikes. EDIT: could be due to paywall posts too, which adds to the script's count but can't actually download.

taux1c commented 1 year ago

This issue has been pointed out before and I haven't been able to successfully pinpoint the problem because it seems irregular. However with what you just said it gives me a couple of ideas! I'll keep this updated.

taux1c commented 1 year ago

Do mind to share the models name via discord or telegram? I would like to have a look at their account.

taux1c commented 1 year ago

https://github.com/taux1c/onlyfans-scraper/discussions/14

Related discussion.

mschairspam commented 1 year ago

Do mind to share the models name via discord or telegram? I would like to have a look at their account.

sukisucchouse69 (mentioned in the original post, and again, NSFW) free to sub 😏 and test on

btw thank u for the good work!

taux1c commented 1 year ago

Sorry I didn't see it in the original, awesome that it's free that will make it super fast to test. 😎

taux1c commented 1 year ago

I just ran it and got 100% of the content. It sounds like as @0x3333 suggested in the discussion about this that it may be a connection issue. Can you try renaming ~/.config/onlyfans-scraper/default_profile/models.db to models.db.bak and run the scraper for just that user again. (This will tell it to download everything.)

mschairspam commented 1 year ago

my download still had issues :/ downloaded slightly less content this time:

  1. renamed models.db to models.db.bak
  2. ran the script as usual, selecting sukisucchouse69 from list of models
  3. got this output:
    
    Name: Suki 🍒 | Username: sukisucchouse69 | ID: 165146912 | Joined: August 04, 2021
    - 6510 posts
    -- 6059 photos
    -- 1135 videos
    -- 0 audios
    - 5 archived posts
    + Getting pinned media...                    
    + Getting timeline media...                    
    + Getting archived media...                    
    + Getting highlights...                    
    + Getting messages...  
    [Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/3024x4032_73c98ba53827c4e411114eb751d4c9b1.jpg'                                  
    Progress: (451 photos, 7 videos, 0 skipped || 285.86 MB):   7%|▏ | 462/6205 [00:56<02:40, 35.88it/s][Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/750x540_8a354eca3a2f53e4caef1bd97e720d48.jpg'                                    
    [Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/1242x2208_9024b31010b1001e0c229b7ab23e43b2.jpg'0, 5.10MB/s]
    Progress: (457 photos, 7 videos, 0 skipped || 289.54 MB):   8%|▏ | 470/6205 [00:56<02:18, 41.28it/s][Errno 8] nodename nor servname provided, or not known███████▊   | 534k/725k [00:00<00:00, 5.39MB/s]
    [Errno 8] nodename nor servname provided, or not known                                              
    Progress: (460 photos, 7 videos, 0 skipped || 291.84 MB):   8%|▏ | 475/6205 [00:57<03:51, 24.78it/s][Errno 8] nodename nor servname provided, or not known                                              
    Progress: (463 photos,[Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/2526x4032_58cfd670402e6c03e5faeac0134ad521.jpg'            
    Progress: (516 photos,[Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/2016x2774_c6e463f1f23cec4f59d8a99f6067c8b4.jpg'            
    unable to open database filedeos, 0 skipped || 346.05 MB):   9%| | 561/6205 [01:05<22:33,  4.17it/s]
    unable to open database filedeos, 0 skipped || 380.4 MB):  10%|▏ | 628/6205 [01:12<09:48,  9.48it/s]
    unable to open database file                                                                        
    Progress: (760 photos, 13 videos, 0 skipped || 466.5 MB):  13%|▎ | 787/6205 [01:19<01:50, 49.02it/s][Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/1125x2000_8ae59fdb2f73e507ef745b5bafe2dda5.jpg'                                  
    Progress: (765 photos, 13 videos, 0 skipped || 469.43 MB):  13%|▏| 793/6205 [01:19<01:57, 46.20it/s][Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/3840x2880_87fee2c16baed50cbb515409bb8e0b1b.jpg'                                  
    [Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/750x878_60f77ced827f614fe10d8edd9fac0190.jpg'              
    [Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/1134x2110_e5802992c1ea09a7e3d128ba3390cdef.jpg'00, 665kB/s]
    Progress: (772 photos,[Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/750x921_414073994ecdd0a25adcb9be30c8a0df.jpg'              
    Progress: (832 photos, 14 videos, 0 skipped || 505.13 MB):  14%|▏| 865/6205 [01:25<06:09, 14.44it/s[Errno 8] nodename nor servname provided, or not known                                               
    Progress: (1076 photos, 19 videos, 0 skipped || 632.52 MB):  18%|▏| 1115/6205 [01:37<03:27, 24.51it[Errno 8] nodename nor servname provided, or not known                                               
    [Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/750x937_5efef7caabdd585547c67f758e7a5a74.jpg'                                    
    Progress: (1080 photos, 19 videos, 0 skipped || 633.68 MB):  18%|▏| 1121/6205 [01:37<02:49, 29.91it[Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/3840x5760_9d2b096fd3725019d3a0c175b445ba2a.jpg'                                   
    [Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/1134x2110_9439264a9b18ea9a5fdc0dda413f48ef.jpg'                                  
    [Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/3840x2967_e89a47d5d0f12bdf62e79c67a61b6614.jpg'                                  
    Progress: (1093 photos, 19 videos, 0 skipped || 638.88 MB):  18%|▏| 1137/6205 [01:38<01:53, 44.46it/[Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/1536x2162_f8b51241cc8e8676515f1c5e910bcb53.jpg'                                  
    Progress: (1507 photos, 28 videos, 0 skipped || 889.48 MB):  25%|▎| 1562/6205 [02:11<10:28,  7.39itunable to open database file                                                                         
    Progress: (1721 photos[Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/3024x4032_edf3415a693f6bf6e6a72439320aa88e.jpg'            
    Progress: (1730 photos[Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/3024x4032_3b7fbd753faff1f715675de8a93c840f.jpg'0, 2.59MB/s]
    [Errno 24] Too many open files: '/Users/xxx/onlyfans_scraper/sukisucchouse69/930x1337_4878463cff478a63a7932d5f1f6f9a79.jpg'                                   
    Progress: (1733 photos[Errno 8] nodename nor servname provided, or not known[02:21<02:10, 33.72it/s]
    Progress: (6004 photos, 159 videos, 0 skipped || 3.83 GB): 100%|█| 6205/6205 [06:01<00:00, 17.18it/s


**notable items**:
- the script took a while in the phase of getting pinned, timeline, archived media, highlights, messages,...
- too many open files
- unable to open database file

perhaps a memory issue if the script holds metadata for 6k+ files at once with no partitioning..? my macbook has 16GB ram and i had this scrape in the background while i was on YouTube.
taux1c commented 1 year ago

This is interesting, this should be throttled. I believe the maximum Concurrent file downloads is set to eight with max of five keep alive connections.

taux1c commented 1 year ago

I am going to look into this some more. Do you have the current branch installed?

taux1c commented 1 year ago

https://wilsonmar.github.io/maximum-limits/

According to the above referenced article. Python file limits vary from system to system. This was something that previously was unaccounted for. I think what I'm going to do to fix this for everyone is to write a patch checking the file limit for the specific system and passing that limit to the throttle with an offset for required files such as models.db

mschairspam commented 1 year ago

I am going to look into this some more. Do you have the current branch installed?

i believe so! before the run which produced the errors in my last comment, i reinstalled the whole tool. fwiw it has been working perfectly for all the other models ive tried (but they're all <1000 files)

taux1c commented 1 year ago

Yeah this is a variable that is set on your system. You will have to update it and then run the script. I have been looking for a way to set it to unlimited with the script across all platforms but without a bunch of tests and extra packages there isn't a way that I have found.

This should help you.

https://superuser.com/questions/261023/how-to-change-default-ulimit-values-in-mac-os-x-10-6