openzim / youtube

Create a ZIM file from a Youtube channel/username/playlist
GNU General Public License v3.0
43 stars 26 forks source link

E-penser scrape fails without error ind debug log #137

Closed kelson42 closed 2 years ago

kelson42 commented 3 years ago

https://farm.openzim.org/pipeline/78ee12e5de4bba2ad0908ff5/debug

kelson42 commented 3 years ago

I don’t know what is the status here but I made a new attempt a now it dies with the followinf error

Traceback (most recent call last): File "/usr/local/bin/youtube2zim-playlists", line 33, in sys.exit(load_entry_point('youtube2zim==2.1.14.dev0', 'console_scripts', 'youtube2zim-playlists')()) File "/usr/local/lib/python3.8/site-packages/youtube2zim-2.1.14.dev0-py3.8.egg/youtube2zim/playlists/main.py", line 15, in main entry() File "/usr/local/lib/python3.8/site-packages/youtube2zim-2.1.14.dev0-py3.8.egg/youtube2zim/playlists/entrypoint.py", line 84, in main from .scraper import YoutubeHandler File "/usr/local/lib/python3.8/site-packages/youtube2zim-2.1.14.dev0-py3.8.egg/youtube2zim/playlists/scraper.py", line 26, in from ..youtube import extract_playlists_details_from, credentials_ok File "/usr/local/lib/python3.8/site-packages/youtube2zim-2.1.14.dev0-py3.8.egg/youtube2zim/youtube.py", line 7, in from zimscraperlib.download import save_file ImportError: cannot import name 'save_file' from 'zimscraperlib.download' (/usr/local/lib/python3.8/site-packages/zimscraperlib/download.py)

rgaudin commented 3 years ago

yes it's an incident bug of trying to fix this issue ; you might have noticed the recipe is on :dev. Don't need to re-run it for now, it's being run off zimfarm with debug logs and --keep for inspection.

rgaudin commented 3 years ago

OK, now that we have logs, it seems that the scraper crashes when adding one file to the ZIM ; that file being the 3.2GB version of this 12h video.

It crashes while adding it as the e-penser_fr_all_2021-01.zim.tmp is 2.1G large.

zimwriterfs has no problem ziming that folder, creating a 15GB file.

This is not a youtube issue anymore. will pursue this on scraperlib/pylibzim. Looking at the code, it looks like poor memory mgmt on scraperlib but this part was changed with the port to libzim_next. Will test libzim_next on it.

As for the 247 code, my guess is that this is due to multithreading but I can't explain it exactly.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 2 years ago

It passed !!! https://farm.openzim.org/pipeline/46e8c752a8094ea5cb7d5716

@rgaudin Should we close the ticket?

rgaudin commented 2 years ago

Yes, apparently, issue was in libzim and libzim update fixed it. There are about a 100 videos. Logs mentions 10 missing videos as those are private.