saveweb / wikiteam3

archiving MediaWikis (and uploading wikidump to the Internet Archive)
https://pypi.org/project/wikiteam3/
GNU General Public License v3.0
30 stars 4 forks source link

crash in uploader when a wiki has no images #31

Open DigitalDwagon opened 1 month ago

DigitalDwagon commented 1 month ago

When a wiki has no images, dumpgenerator doesn't create the images folder - but wikiteam3uploader seems to always expect there to be one, so it crashes when trying to upload these wikis. Workaround: enter the repository and make an empty images folder

Args(keys_file=PosixPath('/root/.doku_uploader_ia_keys'), collection='wikiteam_inbox_1', dry_run=False, update=False, wikidump_dir=PosixPath('/root/upload-queue/bulbapedia.bulbagarden.net_w-20240814-wikidump'), bin_zstd='zstd-latest', zstd_level=22, bin_7z='7z', parallel=True, rezstd=False, rezstd_endpoint='http://pool-rezstd.saveweb.org/rezstd/')                                                                                                                          
=== Loading config from /root/upload-queue/bulbapedia.bulbagarden.net_w-20240814-wikidump ===                                                                
Config(delay=1.5, retries=5, path='/root/upload-queue/bulbapedia.bulbagarden.net_w-20240814-wikidump', logs=False, date='20240814', index='https://bulbapedia.bulbagarden.net/w/index.php', api='https://bulbapedia.bulbagarden.net/w/api.php', xml=True, curonly=False, xmlapiexport=False, xmlrevisions=True, xmlrevisions_page=False, images=True, namespaces=['all'], exnamespaces=[], api_chunksize=50, export='', http_method='POST', failfast=False, templates=False)           
=== Preparing files to upload ===                                                                                                                            
=== commpressing necessary files: ===                                                                                                                        
File /root/upload-queue/bulbapedia.bulbagarden.net_w-20240814-wikidump/bulbapedia.bulbagarden.net_w-20240814-history.xml.zst already exists. Skip compressing.                                                                                                                                                            
*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***                                                                                                        
/root/upload-queue/bulbapedia.bulbagarden.net_w-20240814-wikidump/bulbapedia.bulbagarden.net_w-20240814-history.xml.zst: 68791614696 bytes                   
File /root/upload-queue/bulbapedia.bulbagarden.net_w-20240814-wikidump/bulbapedia.bulbagarden.net_w-20240814-images.txt.zst already exists. Skip compressing.
*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***                                                                                                        
/root/upload-queue/bulbapedia.bulbagarden.net_w-20240814-wikidump/bulbapedia.bulbagarden.net_w-20240814-images.txt.zst: 8 bytes                              
Traceback (most recent call last):                                                                                                                             
  File "/usr/local/bin/wikiteam3uploader", line 8, in <module>                                                                                               
    sys.exit(main())                                                                                                                                           
  File "/usr/local/lib/python3.10/dist-packages/wikiteam3/uploader/uploader.py", line 564, in main                                                           
    upload(arg)                                                                                                                                              
  File "/usr/local/lib/python3.10/dist-packages/wikiteam3/uploader/uploader.py", line 402, in upload                                                         
    filedict = prepare_files_to_upload(                                                                                                                      
  File "/usr/local/lib/python3.10/dist-packages/wikiteam3/uploader/uploader.py", line 235, in prepare_files_to_upload                                        
    images_7z_archive_path = prepare_images_7z_archive(wikidump_dir, config, parallel, images_source=images_source, sevenzip_compressor=sevenzip_compressor) 
  File "/usr/local/lib/python3.10/dist-packages/wikiteam3/uploader/uploader.py", line 158, in prepare_images_7z_archive                                      
    assert images_dir.exists() and images_dir.is_dir()                                                                                                       
AssertionError  
yzqzss commented 1 month ago

this should never happen, dumpgenerator will make images and images_mismatch directories if you used --images

https://github.com/saveweb/wikiteam3/blob/942477f54ef063ad9d95fd0c663857ee63691b43/wikiteam3/dumpgenerator/dump/image/image.py#L59-L60

and uploader will assert those directories exist if config.json["images"] is True.