gruns / furl

🌐 URL parsing and manipulation made easy.
Other
2.6k stars 151 forks source link

Feature request: Convert to unescaped string #157

Open Aran-Fey opened 1 year ago

Aran-Fey commented 1 year ago

When urls (or parts thereof) are converted to a string, they're always escaped:

>>> url = furl('foo.bar/fire truck?hello world=#hi there')
>>> str(url)
'foo.bar/fire%20truck?hello+world=#hi%20there'
>>> str(url.path)
'foo.bar/fire%20truck'
>>> str(url.query)
'hello+world='
>>> str(url.fragment)
'hi%20there'

It would be useful if there was a way to obtain unescaped strings:

>>> url.unescaped_str()
'foo.bar/fire truck?hello world=#hi there'
>>> url.path.unescaped_str()
'foo.bar/fire truck'
>>> url.query.unescaped_str()
'hello world='
>>> url.fragment.unescaped_str()
'hi there'
gruns commented 1 year ago

lets zoom out a bit so i understand the exact problem youre trying to solve! that way we can best solve it with furl :)

to start, what are you using these unescaped strings for?

Aran-Fey commented 1 year ago

Hmm, that's a bit tough to explain. Essentially, my program is a web scraper. You give it an URL as input, and it scrapes that website. You can use the #fragment to narrow down what you want it to scrape. For example, if the URL is example.com#Hello World it looks for a <h1>Hello World</h1> and only scrapes that section. So I need the text "Hello World", and not "Hello%20World".

To put it more generally: furl is designed to output URLs. You put (unescaped) text in, and you get a valid (escaped) URL as output. But you can't do the opposite, i.e. take an URL as input and parse/destructure it into (unescaped) information.