kanasimi / wikiapi

JavaScript MediaWiki API for node.js
https://kanasimi.github.io/wikiapi/
BSD 3-Clause "New" or "Revised" License
50 stars 6 forks source link

`.download()` : compare local and remote files by timestamps before downloading #55

Closed hugolpz closed 2 years ago

hugolpz commented 2 years ago

Timestamp

Timestamp property could be used to compare with existing local file's timestamp. If API timestamp property is smaller (older) than local file timestamp, then skip download. The imageinfo's "timestamp": "2021-04-25T15:49:00Z" indeed matches file description page indicating :

Date/Time Thumbnail Dimensions User Comment
current 15:49, 25 April 2021 1.1 s (99 KB) Kitel WP (talk | contribs)

After verification, files with several uploads provide by default the timestamp of the last upload (default : 1 revision, the latest).

Q4-related (✅ #51)

When a file already exists locally, it could be skipped faster. Given a the time per download, x the number of files to download, b the initial categorymember query time with estimated b=60sec. We could get the second attempt (update) duration to be such as 2.7*14+60 = 97.8secs instead of 540 sec.

Q5-related (✅ #51)

@Poslovitch pointed out that filenames are not enough, some versioning check may be ongoing so recently updated files on commons are indeed re-downloaded. (Discord server invitation) Ciencia-Al-Poder pointed out "First of all, you should avoid redownload files that you downloaded on a previous run. The api will return you the file modification/creation time. Use it to check if the file has been updated." (Discord link)

hugolpz commented 2 years ago

@kanasimi posted:

Hi. Maybe you will be interesting in const file_data_list = await wiki.download('Category:name', { directory: './' }); The conceptions are implemented now, I am using generator now. The codes will do as less calls as possible. You may try it yourself. You may try it yourself and find the codes at https://github.com/kanasimi/CeJS/blob/master/application/net/wiki/page.js#L3893

hugolpz commented 2 years ago

.download() benchmark

Before

After

kanasimi commented 2 years ago

Well, I think this issue is solved.

hugolpz commented 2 years ago

200 times faster 😻🙉🤤