Closed zpeters closed 4 years ago
Hi! I think the web browser extracts what is in the
I think that would work perfectly
Great! Can you assign me the issue?
assigned, thank you!
@lucasturci you could use regexp to find the title or parse it with the golang.org/x/net/html package.
so something like this could work
var title string
htmlBody, err := ioutil.ReadAll(resp.Body)
titleRegex := `.*?<title>(.*?)</title>.*`
re := regexp.MustCompile(titleRegex)
matches := re.FindSubmatch(htmlBody)
if len(matches) > 0 {
title = string(matches[1])
}
site.tite = title
Hey, so, after I implemented this, I realized there may be two types of issues: encoding and titles that have '/'. I don't use windows, but there may be some issues with '\' too. How should I escape this characters? I think we will have to choose a different sign to replace them, because there is no way to escape them, I guess... To know better what I'm talking about, clone my fork and try to run go run cmd/stashbox/main.go -url https://github.com/zpeters/stashbox
The encoding issue is that some sites that are not encoded with utf8 will have bytes not understood by the filesystem, but I don't know if this PR should tackle this problem or another one.
@lucasturci I guess we could do what most applications do. Replace all invalid characters with underscores.
I think that is a good solution for now
Currently it is taking a hash of the path. It would be nice to pull the webpage title, similar to what a web browser would do when you save a file