kanasimi / wikiapi

JavaScript MediaWiki API for node.js
https://kanasimi.github.io/wikiapi/
BSD 3-Clause "New" or "Revised" License
50 stars 6 forks source link

`.download()` on category, recursive : questions. #57

Closed hugolpz closed 2 years ago

hugolpz commented 2 years ago

Previous issues helped develop an efficient recursive download over categories of files via :

 await targetwiki.download(
        "Category:Lingua_Libre_pronunciation-cmn", {
        directory: './',
        max_threads: 4,
        page_filter(page_data) {
            console.log('@Yug: ',JSON.stringify(page_data))
            return true;
        }
    });

Questions

Q1: what is page_filter(page_data){ ...}

Q2 script speed limit and internet connection ?: Is this speed I observe:

  1. limited by some setting within you code ?
  2. purely relying on my internet connection ?

If 2., then :

Q3 max_threads: if I set max_threads: 8, I should be 2 times faster than max_threads: 4, right ? I tested shortly with 12, seems true.

Q4 depth: is it possible to limit category depth ? how ?

Q5 depth loop: is there a risk of depth infinite loop ? (Category:A contains Category:B which contains Category:A)

Q6 header: is the xhr hearder properly set ? See meta:User-Agent_policy

Q7 resilience: Is the script resilient ?

Q8 ogg: how to download alternative file format such as .mp3 from .wav ? Example : file in category File:LL-Q7737_(rus)-1Apollinariya1-кофе.wav -> file to download LL-Q7737_(rus)-1Apollinariya1-кофе.mp3

kanasimi commented 2 years ago

Q1: what is page_filter(page_data){ ...} A1: options.page_filter() is a function to filter result pages. Return true if you want to keep the element.

Q2 script speed limit and internet connection ?: A2: We should call API sequential, we should not set a high options.max_threads, and your net speed is also metter. However you may set a higher options.max_threads, so it seems not a library issue.

Q4 depth: is it possible to limit category depth ? how ? A4: Yes. I updated the document. https://kanasimi.github.io/wikiapi/Wikiapi.html#download

Q5 depth loop: is there a risk of depth infinite loop ? (Category:A contains Category:B which contains Category:A) A5: No. The shallowest category will be selected.

Q6 header: is the xhr hearder properly set ? See meta:User-Agent_policy A6: Use console.log(CeL.get_URL.default_user_agent); to view.

Q7 resilience: Is the script resilient ?

Q8 mp3, ogg: how to download alternative file format such as .mp3 from .wav ? A8: Yes. See https://kanasimi.github.io/wikiapi/Wikiapi.html#download

hugolpz commented 2 years ago

Sounds pretty good. 😻👩‍🎤😎✌🏻

kanasimi commented 2 years ago

Well, I think this issue may be closed. Next time we may use "Projects" in the top to handle this kind of issues.

hugolpz commented 2 years ago

Yes, close and thanks to you. 👍🏼 We did it before the weekend eventually. 🚀 My apologizes also for moving texts, splitting issues, and editing posts all around. I go ahead wiki style to leave behind cleaner places, and more comprehensible issues. Thank you for your letting me the space to do so, I appreciate it. Thanks to for your help on npm packages. 🙏🏼

As for coordination :