Closed cristoper closed 4 months ago
This fix is a little awkward with all of its saving and restoring of BaseStacks. A simpler solution would be to simply get the xml:base attribute from the decoded element and resolve against that. However, that would require changes to goxpp: either change XmlBaseResolveUrl() to accept a string rather than acting as a method, or maybe adding a separate ResolveUrl() function that takes a base and a relative URL that gofeed could use.
Maybe this makes sense @cristoper ?
You are suggesting that in goxpp
we have:
func (p *XMLPullParser) ResolveUrl(baseURL, relativeUrl string) (string, error) {
// Logic to resolve relativeUrl against baseURL + the current BaseStack URLs?
}
Then in Atom Parser, something like this?:
func (ap *Parser) parseAtomText(p *xpp.XMLPullParser) (string, error) {
var text struct {
Type string `xml:"type,attr"`
Mode string `xml:"mode,attr"`
Base string `xml:"base,attr"` // Added to capture xml:base attribute
InnerXML string `xml:",innerxml"`
}
// DecodeElement is used to unmarshal the element
err := p.DecodeElement(&text)
if err != nil {
return "", err
}
// Use the captured xml:base if present; otherwise, use the current base URL
baseURL := text.Base
if baseURL == "" {
baseURL = p.GetCurrentBaseURL() // Fallback to current base URL from the parser
}
// Resolve URLs in InnerXML using the determined base URL
resolvedInnerXML, err := p.ResolveURL(baseURL, text.InnerXML)
if err != nil {
return "", err
}
return strings.TrimSpace(resolvedInnerXML), nil
}
Yes, that's approximately what I had in mind. I just pushed an update to the PR which does similar but copies XmlBaseResolveUrl
to gofeed
so that we don't have to change goxpp
.
However, if you'd prefer not duplicating code from goxpp
we could instead make the change there (even by introducing a new function so we don't make any backward-incompatible change to the goxpp
API).
The issue is that goxpp's DecodeElement() pop's the BaseStack after unmarshaling an element -- as it should to keep the BaseStack in sync with the current document scope. But that means the atom parser needs to keep a reference to the current base before calling DecodeElement() so that it can correctly resolve URLs.
Without this fix, elements with xml:base attributes will be erroneously resolved with the parent xml:base.
This fix is a little awkward with all of its saving and restoring of BaseStacks. A simpler solution would be to simply get the
xml:base
attribute from the decoded element and resolve against that. However, that would require changes togoxpp
: either changeXmlBaseResolveUrl()
to accept a string rather than acting as a method, or maybe adding a separateResolveUrl()
function that takes a base and a relative URL thatgofeed
could use.