As such, homeservers MUST sanitise mxc:// URIs by allowing only alphanumeric (A-Za-z0-9), _ and - characters in the server-name and media-id values.
... but it's unclear about where this sanitisation should happen. Should it apply to event bodies? If so, which fields in event bodies? Does it matter what the event type is? What about event types we haven't invented yet? What should happen if we see an event that doesn't match?
In practice, it's pretty much impossible to apply such rules to event bodies (particularly for encrypted events), so I don't think that's what it means. But then, what does it mean?
https://spec.matrix.org/v1.12/client-server-api/#security-considerations-5 says:
... but it's unclear about where this sanitisation should happen. Should it apply to event bodies? If so, which fields in event bodies? Does it matter what the event type is? What about event types we haven't invented yet? What should happen if we see an event that doesn't match?
In practice, it's pretty much impossible to apply such rules to event bodies (particularly for encrypted events), so I don't think that's what it means. But then, what does it mean?