spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser
https://observablehq.com/@spencermountain/wtf_wikipedia
MIT License
770 stars 129 forks source link

Safe URL Handling #546

Closed MarketingPip closed 1 year ago

MarketingPip commented 1 year ago

Weird request, but I think it would be suitable being built directly into the library. Or changing the way the library works.

When looping through a ton of text. Sometimes you get URLs. Which those URL's will cause wtf.fetch to fail.

I would assume this is due to WTF_Wikipedia providing option to fetch a document from another / different provided "URL" instead of the default Wikipedia API url.

Rather than implementing some weird, crazy function to detect if valid Wiki. Create a parameter / setting etc - to change the Wiki fetch URL.

Example:

wtf_fetch(title, {settings:{wiki_url:"here"})

Suggesting you adjust the current settings object for this.

spencermountain commented 1 year ago

hey, can you give an example? It sounds like you're dynamically fetching from a data source, but some lookups are urls, and some are not?

MarketingPip commented 1 year ago

@spencermountain - no! I am implying to use another data source that is not default (such as another Wiki) etc. Like the example below.

// 3rd-party wiki
let doc = await wtf.fetch('https://muppet.fandom.com/wiki/Miss_Piggy')

Let's change the handling to be

// 3rd-party wiki
let doc = await wtf.fetch('Miss_Piggy', {wiki_url:"https://muppet.fandom.com/wiki/"})

So when handling LARGE amounts of data hint hint - at what I have been mentioning I will be adding you too shortly - we don't run into any error's if passed a URL that is not a 3rd party Wiki.

spencermountain commented 1 year ago

is this what you're looking for

wtf.fetch('Kermit', { domain: 'muppet.fandom.com' }).then((doc) => {
  console.log(doc.text())
})

you can also set it permanently with .domain() i believe cheers

MarketingPip commented 1 year ago

@spencermountain - exactly what I was looking for! My apologizes! Must have skipped this in documentation! 🤦