spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser
https://observablehq.com/@spencermountain/wtf_wikipedia
MIT License
770 stars 129 forks source link

undefined titles create an error in makeUrl.js #502

Closed rg3h closed 1 year ago

rg3h commented 1 year ago

_fetch.js calls makeUrl(options) which errors when the page is undefined. This came from getting page links for 'December_1' whose entry for 1941 appears to have the offending page title.

when I console.log(options) in _fetch.js I saw: _fetch.js options { lang: 'en', wiki: 'wikipedia', domain: undefined, follow_redirects: true, path: 'api.php', title: [ 'Mayor of New York City', 'Office of Civilian Defense', undefined, <--- ahah! 'Civil Air Patrol' ] }

We could have makeUrl() skip undefined pages, but it might be better to see why the options has an undefined title page. _fetch.js calls parseUrl(title) if the title is a string, but not if it is an array of strings. Is this correct?

I added this code to _fetch.js around line 61 to remove undefined titles from the array, but wonder if parseUrl() on each item would be better or if there are better checks to perform on the titles?

  if (typeof title === 'string' && isUrl.test(title)) {
    options = { ...options, ...parseUrl(title) }
/***** begin added *****/
  } else if (Array.isArray(title)) {
    // check for undefined titles in array. Should this run parseUrl on each?
    for (let i = 0, count = title.length; i < count; ++i) {
      let titleItem = title[i];
      if (!titleItem || typeof titleItem !== 'string') {
        title.splice(i, 1);
        --i;       // decrement since we removed an item
        --count;
      }
  }
/**** end added ****/

My apologies for not doing a pull request. I have never done one for github and need to learn more.

spencermountain commented 1 year ago

hey RIch, you're right. This is a good catch, and I'm open to any solution you think is best. I'm happy to fix it for the next release, or we have some docs and patience to help first-timers. it sounds like you're close! cheers

rg3h commented 1 year ago

Hi Spencer!

Next release seems good Go for it -- push it into your next release.

If you are in there, what are your thoughts on making everything an array internally? I love the versatility of being able to pass in a single doc or an array of docs, thank you for that! But secretly, internally, you could make everything an array: let docList = Array.isArray(itemOrList) ? itemOrList : [itemOrList]; // turn a single item into an array of 1 Then it would all go through one flow

Docs on how to pull request These docs are great!! Thanks so much for this, both in clarity and community! I fear I won't get to it soon enough. But I added it to my "todo" to find something else and try it out!

Catch up sometime? Maybe sometime in January we can do a quick 15 minute video chat coffee? I'd like to learn more about your past and your adventures with wikipedia. I am an ex-googler mad scientist (15 years there! Did some time at tableau, NASA, PARC, etc), total goof, hacking on some new models for processing and sharing information. I'm located in CA as far as time zones go.

nice to meet the famous Spencer Kelly!

On Fri, Dec 2, 2022 at 9:43 AM spencer kelly @.***> wrote:

hey RIch, you're right. This is a good catch, and I'm open to any solution you think is best. I'm happy to fix it for the next release, or we have some docs https://github.com/spencermountain/wtf_wikipedia/wiki/Contributing and patience to help first-timers. it sounds like you're close! cheers

— Reply to this email directly, view it on GitHub https://github.com/spencermountain/wtf_wikipedia/issues/502#issuecomment-1335582696, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5THQX3HNVH5BT4TJFX7ZLWLIYLDANCNFSM6AAAAAASRIKE24 . You are receiving this because you authored the thread.Message ID: @.***>

spencermountain commented 1 year ago

ahh, cool! Ya I'd love to. I hope nobody ever discovers my plan to honeypot high-profile contacts through javascript bugs.

I'd love to hear about what you're working on. Feel free to schedule something anytime this week, or next. spencerkelly86@gmail.com cheers

spencermountain commented 1 year ago

fixed in 10.0.4