Open soundasleep opened 2 years ago
Update: If you're trying to find IDs for elements that are naturally empty (such as <input>
), turns out there's a separate filter for empty elements and normal elements. The XMLDocumentFilter should instead be:
XMLDocumentFilter idEnhancer = new DefaultFilter() {
/**
* Makes #getElementById() work on any set of attributes
*/
private void possiblyAddIdAttribute(XMLAttributes attributes) {
int idx = attributes.getIndex("id");
if (idx > -1) {
attributes.setType(idx, "ID");
Augmentations attrsAugs = attributes.getAugmentations(idx);
attrsAugs.putItem(Constants.ATTRIBUTE_DECLARED, Boolean.TRUE);
}
}
@Override
public void startElement(QName element, XMLAttributes attributes, Augmentations augs) throws XNIException {
possiblyAddIdAttribute(attributes);
super.startElement(element, attributes, augs);
}
@Override
public void emptyElement(QName element, XMLAttributes attributes, Augmentations augs) throws XNIException {
possiblyAddIdAttribute(attributes);
super.emptyElement(element, attributes, augs);
}
};
I found that if you try to load a Document via DefaultDOMSource,
#getElementById()
always returnsnull
.As far as I can tell, this is because cssbox is using NekoHTML as its XML parser, and it's not set up to be a validating parser, and Xerces is the underlying parser, that requires it to be a validating parser in order for id="..." to work . I think?
However I did find a fix on sourceforge by adding a custom filter to NekoHTML:
I think this could be added to
DefaultDOMSource
, orHTMLConfiguration
, but I'd imagine you'd want to add test cases as well, and I'm not sure what the implications of this might be.