openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
269 stars 71 forks source link

Clean temporary data if mwoffliner crash #448

Closed kelson42 closed 5 years ago

kelson42 commented 5 years ago

In redis and on the filesystem.

kelson42 commented 5 years ago

@ISNIT0 As we don't write any temporary date anymore in 1.8. Do you confirm this is fixed automatically?

ISNIT0 commented 5 years ago

We do still write temporary data: https://github.com/openzim/mwoffliner/issues/566#issuecomment-465020622

ISNIT0 commented 5 years ago

All tmp data is written to /tmp, there is no longer a local temporary data directory

kelson42 commented 5 years ago

I have made a test by interrupting mwoffliner "in the middle" and the redis had temporary information still in it. It looks like the cleaning of redis in case of interruption does not work properly:

# redis-cli flushall
OK
root@camber:/dev/shm/mwoffliner# mwoffliner --verbose --mwUrl=https://bm.wikipedia.org/ --adminEmail=kelson@kiwix.org
[log] [2019-03-03T09:02:29.354Z] Getting text direction...
[info] [2019-03-03T09:02:29.354Z] Downloading [https://bm.wikipedia.org/wiki/]
[log] [2019-03-03T09:02:29.356Z] Getting site info...
[info] [2019-03-03T09:02:29.356Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&meta=siteinfo&format=json&siprop=general|namespaces|statistics|variables|category|wikidesc]
[log] [2019-03-03T09:02:29.356Z] Getting sub-title...
[info] [2019-03-03T09:02:29.356Z] Downloading [https://bm.wikipedia.org/wiki/]
[log] [2019-03-03T09:02:29.595Z] Text direction is [ltr]
[info] [2019-03-03T09:02:29.608Z] Getting JSON from [https://bm.wikipedia.org/api/rest_v1/page/mobile-sections/Ny%C9%9B_f%C9%94l%C9%94]
[log] [2019-03-03T09:02:29.666Z] Using a remote MCS instance
[info] [2019-03-03T09:02:29.667Z] Creating dump temporary directory [/tmp/mwo-dump-1551603749666]
[info] [2019-03-03T09:02:29.670Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces|namespacealiases&format=json]
[log] [2019-03-03T09:02:29.845Z] Getting article ids for [namespace=] 
[info] [2019-03-03T09:02:29.845Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&colimit=max&prop=revisions|coordinates&gapnamespace=0&format=json&rawcontinue=]
[log] [2019-03-03T09:02:30.439Z] Getting article ids for [namespace=]  (from Malakoff_(Hauts-de-Seine))
[info] [2019-03-03T09:02:30.439Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&colimit=max&prop=revisions|coordinates&gapnamespace=0&format=json&rawcontinue=&gapcontinue=Malakoff_(Hauts-de-Seine)]
[info] [2019-03-03T09:02:31.072Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&redirects&format=json&prop=revisions|coordinates&titles=Ny%C9%9B_f%C9%94l%C9%94]
[info] [2019-03-03T09:02:31.269Z] Redirect queue has [730] items
[info] [2019-03-03T09:02:31.270Z] Got [10] redirect urls for [500] articles
[info] [2019-03-03T09:02:31.270Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Bambara|F%C9%94likan|Bamak%C9%94|Dogoso|Jakuma|Antropoloji|Jeliba|Kati|Kulibali|H%C9%94r%C9%94nya|Finkolo|Ku|Jose_Trawere|Kunkolos%C9%9Bm%C9%9B|Ali_Baba_ni_n'sonke_binaani_k%C9%9Bl%C9%9B|Bamanan|Bagan|Ko_k%C9%94r%C9%94b%C9%94|Furu_sira_taa-ma|Bolokoli|C%C9%9Bkelen_dugu|Dumuni|Faransekan|Bongo|Dogo_non_dugu|Dogonoka_da|Dibo|Kita|Kulik%C9%94r%C9%94|Banjagara|Gao|Djenne|Kidal|Kaba|Hadamaden_josiraw_dantig%C9%9Bkan|Diy%C9%9Bn_ka_j%C3%A0m%C3%A0naw|Basi|Koutiala|Ibu|K%C9%94ri|Bondo_la|Buteli|Ji|Ba|Fali|Jamanatigi|Ali_Farka_Ture|Gaf%C9%9B|K%C9%94n%C9%94wari|Albert_Einstein]
[info] [2019-03-03T09:02:31.270Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=D%C9%94nke|K%C9%94t%C9%9Bba|Kan|Diin%C9%9B|Kankalan|J%C3%A8li|K%C9%9Bn%C9%9Bya|D%C9%94nniyagafeba|Dolokalan|Farikolo%C9%B2%C9%9Bnaj%C9%9B|Baaraw|F%C9%94likankalan|D%C3%B9g%C3%B9k%C3%B2l%C3%B2k%C3%B9nn%C3%A0kalan|Kunafonilaseli|K%C3%B9ran|D%C9%94k%C9%94t%C9%94r%C9%94ya|D%C3%B2n%C3%ACni|Bibulu|B%C3%A0g%C3%A0nin|Julaya|Afel_Bokum|Isa_Jalo|Bamanankan|Kenya|Ben%C9%9Bn|Gana|Jine|Kuritiba|Angil%C9%9Bkan|Kunnafoni_minanw|Aerospacial|Habib_Kuwate|Babenba_Trawere|C%C9%9Bba_Trawere|Amadu_Tumani_Ture|Alfa_Umar_Konare|Amadu_ni_Mariyam|Gw%C9%9Bl%C9%9B|ARP|FAST|Ji%C9%B2%C9%9B_jamanaw|Afrika|Eropa|Kanku_Musa|Banaw|K%C9%94m%C9%94kili_d%C9%9Bs%C9%9B|Kanada|Kuran%C9%9B|Kaseti|Fanga_%C9%B2%C9%9Bmaaw]
[info] [2019-03-03T09:02:31.270Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Eropa_J%C9%9Bkulu|Burukina_Faso|Gambia|Buguni|Latin|Gitari|Fransekanf%C9%94t%C9%94n|Elhaj_Umar_Tal|Kartaje|Cad|Angola|Burundi|Gabon|Kongo-Brazaville|Libya|Ko%C5%84skowola|Liberia|Gine_Bisau|Kaarta_bambara_fanga_lawal%C9%9B_la|Gine|C%C9%9Bmajan_Gine|Kamerun|Kongo_ka_B%C9%9Bj%C9%9Bfanga_Fasojamana|Bakrunba_Sok%C9%94n%C9%94_Karajago|Eropa_jamanaw|Faransi|Alima%C9%B2i|Cin%C3%A9ma_de_notre_temps%3A_Souleymane_Ciss%C3%A9|7000_km_plus_loin|A_Banna|An_Be_N%C9%94_do|Baara_(film)|Den_Muso|Desebagato|Djourou%2C_une_corde_%C3%A0_ton_cou|Falato|Finye_(filmu)|Finzan|La_Gen%C3%A8se|Guimba%2C_un_tyran_une_%C3%A9poque|Haramuya|Kabala|Kasso_Den|Kiri_Kara_Watita|Le_M%C3%A9decin_de_Gafire|Barack_Obama|Dunia_S%C9%94r%C9%94ko_Forobak%C9%9Bn%C9%9B|Bakelenda_Yiriwa_Kuntlenaw|J%C9%9Bnselekenaani_Wariko_Kondo|Dunia_Wariso]
[info] [2019-03-03T09:02:31.270Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Aisland|Finland|Estonia|Lionkan|Huang_Sianfan|China|Confucius|Latino_Amerika|Gatemala|Belize|Honduras|El_Salvador|Cema_Amerika|Kosta_Rika|Kap_Verdi|Botswana|Lesoto|Eritere|Etiopia|Komore_gun|Madagaskar|C%C9%9Bma_Afrika_Fasojamana|Brazil|(%DF%92%DF%9E%DF%8F)|Kungodon|Kandon|Jamaika|Amerika_ka_Kelenyalen_Jamanaw|Kuba|Ayiti|Bahamasi|Kayi|Asia|Karayibu_Ji|Bolivia|Argentina|Folo_Bamanana_Furu|Atlantik_Kogoji|Cema_Koron|Lubenan|Israil|Corbin_Bleu|Balanzan|Baki|D%C9%94%CC%80nk%C9%94ri|Esperanto|Ho_Chi_Minh-Ville|Dong_Hoi|Fernando_Alonso|Kand%C9%94n]
[info] [2019-03-03T09:02:31.270Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=J%C3%B8dedomen|Kumajago-senw|Kumajago-sen|Bisigili|Kununni|Kumamunun-munun|Lakumani|F%C9%94koflayali|Kumasen|Kuma-jagodonna|D%C9%94nniya|Jagomajira|Bamanankan_mab%CE%B5n|Giripu_p%C9%94risini|Gugolu|Balikukalan|Di%C9%B2%C9%9B_ntolatanba_san_2010_Afirikidisidi|F%C9%94lif%C9%9Bn|Concepci%C3%B3n|Lingua_Franca_Nova|Dakar|Da_Lat|Lugo|Alizeri|Coritiba_Foot_Ball_Club|Leandra_Leal|Anggun|Khasso|ESB_Villeneuve-d'Ascq|Caselle_Landi|Berlin|Espankan|Agnes_Monica|Alimank%C9%9Bl%C9%9B_f%C9%94l%C9%94|Bochotnica|Julius_Caesar|Ebola_virisi_bana|B%C9%9Bliziki|Chile|Bernie_Sanders|Barbara_Chiappini|Benedicta_Boccoli|Beijing|Jerusalem|London|Hong_Kong|Brussels|Amsterdam|Istanbul|Ankara]
[info] [2019-03-03T09:02:31.271Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Jakarta|Bangkok|Baghdad|Athens|Bogot%C3%A1|Buenos_Aires|Los_Angeles|Madrid|Barcelona|Kyoto|Guangzhou|Chongqing|Chengdu|Kongon|Akra|K%C9%94nakri|Freetown|La_Serena|D%C3%A4ttlikon|Alger|Gautama_Buddha|Djamana_d%C3%A9kourou_sanfe_bada_bon_ka_fada_dogona_(UNHCR)|Lom%C3%A9|Bisau|Bourg-la-Reine|Antony|Asni%C3%A8res-sur-Seine|Bagneux_(Hauts-de-Seine)|Bois-Colombes|Boulogne-Billancourt|Ch%C3%A2tenay-Malabry|Ch%C3%A2tillon_(Hauts-de-Seine)|Chaville|Clamart|Clichy|Colombes|Courbevoie|Fontenay-aux-Roses|Garches|Gennevilliers|Issy-les-Moulineaux|La_Garenne-Colombes|Le_Plessis-Robinson|Levallois-Perret|Chaource|Antibes|Biot|Cannes|Grasse|Edirne]
[info] [2019-03-03T09:02:31.271Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Arcachon|Bordeaux|Coutras|Atlanta|Boston|Chicago|Dallas|El_Paso_(Texas)|Fresno|Greensboro|Houston|Indianapolis|Jacksonville|Kansas_City_(Missouri)|Las_Vegas|Austin_(Texas)|Columbus_(Ohio)|Fort_Worth|Charlotte|D%C3%A9troit|Baltimore|Denver|Louisville_(Kentucky)|Albuquerque|Long_Beach|Colorado_Springs|Cleveland|Arlington_(Texas)|Bakersfield|Honolulu|Anaheim|Corpus_Christi|Lexington_(Kentucky)|Anchorage|Buffalo_(New_York)|Cincinnati|La_Baule-Escoublac|Gu%C3%A9rande|Corsept|La_Bernerie-en-Retz|Chauv%C3%A9|Les_Moutiers-en-Retz|Ch%C3%A9m%C3%A9r%C3%A9|Brains|Cheix-en-Retz|Bouaye|La_Montagne|La_Plaine-sur-Mer|Cou%C3%ABron|La_Chapelle-sur-Erdre]
[info] [2019-03-03T09:02:31.271Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Abbaretz|Basse-Goulaine|Carquefou|La_Marne|Machecoul-Saint-M%C3%AAme|Bouguenais|Avessac|Conquereuil|Derval|Gu%C3%A9men%C3%A9-Penfao|Jans|La_Grigonnais|Jou%C3%A9-sur-Erdre|La_Meilleraye-de-Bretagne|Grand-Auvern%C3%A9|Iss%C3%A9|Les_Touches|Casson|Erbray|Louisfert|Ch%C3%A2teaubriant|Juign%C3%A9-des-Moutiers|Divatte-sur-Loire|Le_Cellier|Couff%C3%A9|Lign%C3%A9|Ancenis|Anetz|La_Roche-Blanche_(Loire-Atlantique)|Aigrefeuille-sur-Maine|Ch%C3%A2teau-Th%C3%A9baud|Le_Bignon|La_Chapelle-Glain|Blois|Bourges|Chartres|Ch%C3%A2teauroux|Gien|Liran|Ambelau|Biak|Kofiau|Kobroor|Jiri|Ny%C9%9B_f%C9%94l%C9%94|Mali_ka_s%C9%94w|Mandenkan|Mali|Wikipedi|Santiago_de_Compostela]
[info] [2019-03-03T09:02:31.271Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Pour_l'Afrique_et_pour_toi%2C_Mali|Su_ko_ya|Misirik%C9%94r%C9%94|Sikaso|N%C9%94n%C9%94|Sokurani|Nare_Famakan_ni_Sogolo_k%C9%94kuru_ka_furu|Woron_son|W%C3%B9l%C3%B9|Mali_ka_kiltiri|Yiri|Tangalan_d%C9%9Bs%C9%9B_Bana|Si|Wiki|Sibi|Segu|Timbuktu|Sanga|Yanfolila|Mopti|Samiya|Saniena|Sunbala|Nsirasun|N'gala|N'gamac%C9%9Bbugu|Tembin%C3%A9|Marakaw|Misi|Sinima_bamanankan|Sanji|Saga|Salif_Ke%C3%AFta|Togo|Sinima|Politiki|Silameya|Tariku|Sy%C9%9Bfan|Tulon|Sok%C9%94n%C9%94bagan|S%C9%9Bn%C9%9Bk%C9%9B|Tobilin%C9%94|Malik_Sidibe|S%C3%A0r%C3%ACya|Nafasorosira|Yesu_Krista|Solomana_Kante|Maninkakan|Mandinko-kan]
[info] [2019-03-03T09:02:31.271Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=N'ko_sebesun|Poyi|N%C9%9Br%C9%9B|Moham%C9%9Bd_Misbaho_Trawele|Niz%C9%9Bri|Miniankakan|Mali_duguw|Yesu_Lakika_Egilisi|Uetersen|Mali_jamuw|Ubuntu|Samori_Ture|Modibo_Keita|Umu_Sangare|Monzon_Jara|Tumani_Jabate|Phoenix|Musa_Trawere|S%C9%94g%C9%94s%C9%94g%C9%94nic%C9%9B|Senegal|Nijeria|Moritani|Maroko|Sahara-c%C9%9Bla_jago|Tunizi|Misra|Sudan|Sunjata_Keita|Malawi|Namibia|Mozanbik|Uganda|Rwanda|Soni_Ali_Ber|Songai_Mansamara|Sera_Leon|Seshel|Mali_ka_k%C9%94ri_y%C9%9Bl%C9%9Bma_y%C9%94r%C9%94_(CMDT)|Worodugu_Afrika|Tanzania|Zanbia|Ramata_Jakite|Moko_Dakhan|Nyamanton|Sia%2C_le_r%C3%AAve_du_python|Ta_Dona|Taaf%C3%A9_Fanga|Waati|Yeelen|Y%C9%9Bl%C9%9Bma_donna_kow_la_nank%C9%94r%C9%94la]
[log] [2019-03-03T09:02:31.479Z] 165 redirect(s) found
[info] [2019-03-03T09:02:31.480Z] Got [5] redirect urls for [230] articles
[info] [2019-03-03T09:02:31.480Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Saheli_Amerika|USA_Jamanakuntigiw|Ntigi%C9%B2%C9%9B|Slumdog_Millionnaire|Risila|Norwij|Swedi|Qin_Shi_Huang|M%C9%9Bkisiki|Nikaragwa|Panama|Zimbabwe|Pigazzano|N'Ko|Masaladon|Zana|N'Ko_(%DF%92%DF%9E%DF%8F)|Oseania|Ostralia|Venezuela|Peru|Mediterane_Baji|Ordon|Suriya|Saudia_Arabu_ka_Faamamara|%C6%9Dinini|Portail%3AMali|Mali_(bagan)|Ntilen|Vietnam|Masalad%C9%94n|Sannad%C9%94n|Masalad%C3%A7n|Masala-jagodonna|Masala-jagodon|Masalas%CE%B5b%CE%B5n|Masalaboso|Sangali|Sigiy%C9%94r%C9%94-falen|Sendonni|Senflaninyali|Nimayali|Y%CE%B5l%CE%B5b%C9%94nna|S%CE%B5b%CE%B5nnisen|Melekuya|Siginikalan|Sigini|Siginiden|Tiwit%C9%9Bri|%C6%90nt%C9%9Brin%C9%9Bti]
[info] [2019-03-03T09:02:31.480Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Telef%C9%94ni_yaalata|Mototakisi|Si%C9%B2%C9%9Bta|Osasko|Traor%C3%A9|Samoa|New_Zealand|Paltoga|Mangoro|Martinez_FAW-702|Moshchena|Saratov|Nha_Trang|Phan_Thiet|Vung_Tau|Thiago_Fragoso|Ta%C3%ADs_Ara%C3%BAjo|Mbolokele|Villeneuve-d'Ascq|SAMBACANOU|Ronald_Reagan|Trojany|Somali|Sung_Jae-gi|%C6%9D%C9%94g%C9%94m%C9%9B|Sigi|Sama|Waraba|T%C3%BCrkiye|Stary_Dw%C3%B3r|Warszawa|%C5%81obez|Radio_Studio_54_Network|Rio_de_Janeiro|S%C3%A3o_Paulo|Tokyo|Moscow|Paris|Rome|Saint_Petersburg|New_York_City|Valpara%C3%ADso|Shanghai|Vienna|Washington%2C_D.C.|%C4%B0zmir|Sydney|Valencia|Seville|Yokohama]
[info] [2019-03-03T09:02:31.480Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Sapporo|Osaka|Nagoya|Xi'an|Wuhan|Shenzhen|Tianjin|Nanjing|Tom_and_Jerry|Worodugu_Sudan|Yamusukuro|Santiago|Muhammed|Swaziland|Mbabane|Porto_Novo|Nairobi|M%C9%94r%C9%94ni|M%C9%94nrovia|Wagadugu|Niam%C9%9B|Ngolo|%C5%A0iprage|Roubaix|Stokolm|Ni_yoro_mamunw_jamana_bila_kana_itali|Sigui_sebe|Malakoff_(Hauts-de-Seine)|Marnes-la-Coquette|Meudon|Montrouge|Nanterre|Neuilly-sur-Seine|Puteaux|Rueil-Malmaison|Saint-Cloud|Sceaux|S%C3%A8vres|Suresnes|Vanves|Vaucresson|Ville-d'Avray|Villeneuve-la-Garenne|Nantes|Pornichet|Saint-Nazaire|Sainte-Savine|Troyes|Nice|Saint-Paul-de-Vence]
[info] [2019-03-03T09:02:31.480Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Memphis_(Tennessee)|Nashville|Oklahoma_City|Philadelphie|Raleigh|San_Francisco|Tucson|Virginia_Beach|Winston-Salem|Phoenix_(Arizona)|San_Antonio|San_Diego|San_Jos%C3%A9|Seattle|Milwaukee|Portland_(Oregon)|Sacramento|Mesa|Omaha|Miami|Tulsa|Minneapolis|Oakland|Wichita|Pittsburgh|Tampa|Toledo_(Ohio)|Piriac-sur-Mer|Paimb%C5%93uf|Saint-P%C3%A8re-en-Retz|Vue|Pornic|Saint-Brevin-les-Pins|Pr%C3%A9failles|Saint-Michel-Chef-Chef|Saint-Hilaire-de-Chal%C3%A9ons|Port-Saint-P%C3%A8re|Rouans|Saint-L%C3%A9ger-les-Vignes|Saint-Jean-de-Boiseau|Sainte-Pazanne|Villeneuve-en-Retz|Sautron|Saint-Philbert-de-Grand-Lieu|Rez%C3%A9|Saint-S%C3%A9bastien-sur-Loire|Saint-Nicolas-de-Redon|Marsac-sur-Don|Nozay_(Loire-Atlantique)|Vay]
[info] [2019-03-03T09:02:31.480Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&prop=redirects&format=json&rdprop=title&rdlimit=max&rawcontinue=&titles=Puceul|Treffieux|Nort-sur-Erdre|Suc%C3%A9-sur-Erdre|Riaill%C3%A9|Saffr%C3%A9|Moisdon-la-Rivi%C3%A8re|Saint-Vincent-des-Landes|Trans-sur-Erdre|Petit-Mars|Saint-Mars-du-D%C3%A9sert|Petit-Auvern%C3%A9|Saint-Julien-de-Vouvantes|Mauves-sur-Loire|Sainte-Luce-sur-Loire|Thouar%C3%A9-sur-Loire|Saint-Julien-de-Concelles|Oudon|Mouzeil|Saint-G%C3%A9r%C3%A9on|M%C3%A9sanger|Roug%C3%A9|Saint-Sulpice-des-Landes|Vierzon|Ukrain|Numfor|Supiori|Yapen|Mios_Num|Ny%C9%9B_f%C9%94l%C9%94]
[log] [2019-03-03T09:02:31.676Z] 24 redirect(s) found
[log] [2019-03-03T09:02:31.676Z] All redirect ids retrieve successfuly.
[warn] [2019-03-03T09:02:31.677Z] Couldn't find strings file for [bm], falling back to [en]
[log] [2019-03-03T09:02:31.677Z] Doing dump: [[object Object]]
[log] [2019-03-03T09:02:31.678Z] Writing zim to [/dev/shm/mwoffliner/out/wikipedia_bm_all_2019-03.zim]
Tmp directory already exists, deleting [/dev/shm/mwoffliner/out/wikipedia_bm_all_2019-03.tmp]
No steemming for language 'bm'
[info] [2019-03-03T09:02:31.681Z] Copying Static Resource Files
[info] [2019-03-03T09:02:31.684Z] Finding stylesheets to download
[info] [2019-03-03T09:02:31.684Z] Downloading [https://bm.wikipedia.org/wiki/]
[log] [2019-03-03T09:02:31.849Z] Found [3] stylesheets to download
[log] [2019-03-03T09:02:31.849Z] Downloading stylesheets and populating media queue
[info] [2019-03-03T09:02:31.851Z] Downloading CSS from http://bm.wikipedia.org/w/load.php?debug=false&lang=bm&modules=ext.3d.styles|ext.uls.interlanguage|ext.visualEditor.desktopArticleTarget.noscript|ext.wikimediaBadges|mediawiki.legacy.commonPrint%2Cshared|mediawiki.skinning.interface|skins.vector.styles|wikibase.client.init&only=styles&skin=vector
[info] [2019-03-03T09:02:31.851Z] Downloading [http://bm.wikipedia.org/w/load.php?debug=false&lang=bm&modules=ext.3d.styles%7Cext.uls.interlanguage%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Cwikibase.client.init&only=styles&skin=vector]
[info] [2019-03-03T09:02:31.851Z] Downloading CSS from http://bm.wikipedia.org/w/load.php?debug=false&lang=bm&modules=site.styles&only=styles&skin=vector
[info] [2019-03-03T09:02:31.851Z] Downloading [http://bm.wikipedia.org/w/load.php?debug=false&lang=bm&modules=site.styles&only=styles&skin=vector]
[info] [2019-03-03T09:02:31.851Z] Downloading CSS from https://bm.wikipedia.org/wiki/Mediawiki:offline.css?action=raw
[info] [2019-03-03T09:02:31.851Z] Downloading [https://bm.wikipedia.org/wiki/Mediawiki:offline.css?action=raw]
^C
root@camber:/dev/shm/mwoffliner# redis-cli INFO | grep ^db
db0:keys=2,expires=0,avg_ttl=0
ISNIT0 commented 5 years ago

I've added a redis clearing script that happens before starting the dumps: https://github.com/openzim/mwoffliner/blob/master/src/mwoffliner.lib.ts#L201

kelson42 commented 5 years ago

@ISNIT0 This is good but insufficient IMO. You should catch event/signal generated at the script interruption.

ISNIT0 commented 5 years ago

@kelson42 This issue was not originally in 1.8 - I think we should move it to another milestone.

kelson42 commented 5 years ago

@ISNIT0 You are right, sorry was persuaded we had a ticket about that already.

kelson42 commented 5 years ago

I reopen the ticket as this seems to not work properly, here an example (with git master head):

[info] [2019-06-28T21:02:21.249Z] Getting JSON from [https://en.wiktionary.org/w/api.php?action=query&format=json&prop=redirects%7Crevisions%7Cpageimages&rdlimit=max&rdnamespace=0&rawcontinue=true&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&gapnamespace=0&gapcontinue=constrangidos&picontinue=700198]
[info] [2019-06-28T21:02:21.963Z] Getting JSON from [https://en.wiktionary.org/w/api.php?action=query&format=json&prop=redirects%7Crevisions%7Cpageimages&rdlimit=max&rdnamespace=0&rawcontinue=true&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&gapnamespace=0&gapcontinue=constrangidos&picontinue=2473054]
[info] [2019-06-28T21:02:22.675Z] Getting JSON from [https://en.wiktionary.org/w/api.php?action=query&format=json&prop=redirects%7Crevisions%7Cpageimages&rdlimit=max&rdnamespace=0&rawcontinue=true&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&gapnamespace=0&gapcontinue=constrangidos&picontinue=530003]
[info] [2019-06-28T21:02:23.381Z] Getting JSON from [https://en.wiktionary.org/w/api.php?action=query&format=json&prop=redirects%7Crevisions%7Cpageimages&rdlimit=max&rdnamespace=0&rawcontinue=true&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&gapnamespace=0&gapcontinue=constrangidos&picontinue=1737658]
[log] [2019-06-28T21:02:24.490Z] Got [500] articles from namespace [0]
[info] [2019-06-28T21:02:24.490Z] Getting JSON from [https://en.wiktionary.org/w/api.php?action=query&format=json&prop=redirects%7Crevisions%7Cpageimages&rdlimit=max&rdnamespace=0&rawcontinue=true&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&gapnamespace=0&gapcontinue=constructive_criticisms]
Failed to run mwoffliner after [19569s]: {
    "stack": "TypeError: (normalized[page.title] || page.title || \"\").replace is not a function\n    at Object.values.reduce (/media/kelson/SOTOKI/mwoffliner/src/util/mw-api.ts:107:76)\n    at Array.reduce (<anonymous>)\n    at Object.normalizeMwResponse (/media/kelson/SOTOKI/mwoffliner/src/util/mw-api.ts:106:10)\n    at Downloader.<anonymous> (/media/kelson/SOTOKI/mwoffliner/src/Downloader.ts:208:42)\n    at Generator.next (<anonymous>)\n    at fulfilled (/media/kelson/SOTOKI/mwoffliner/src/Downloader.ts:4:58)\n    at process._tickCallback (internal/process/next_tick.js:68:7)",
    "message": "(normalized[page.title] || page.title || \"\").replace is not a function"
}

**********

(normalized[page.title] || page.title || "").replace is not a function

**********

[log] [2019-06-28T21:02:25.232Z] Exiting with code [2]
[log] [2019-06-28T21:02:25.232Z] Deleting tmp dump dir [/tmp/mwo-dump-1561736177901]
[log] [2019-06-28T21:02:25.233Z] Clearing Cache Directory
kelson@camber:/media/kelson/SOTOKI/mwoffliner$ redis-cli --scan 
1561736177898-detail
1561736177898-redirects