Closed MichalW closed 7 years ago
@MichalW right, all script
tags is removed.
I think is this line: https://github.com/jazeee/jazeee-meteor-spiderable/blob/master/lib/phantom_script.js#L49
Could you change:
output = output.replace(/<script[^>]+>(.|\n|\r)*?<\/script\s*>/ig, '');
to eg.
output = output.replace(/<script(?![^>]+ld\+json)[^>]+>(.|\n|\r)*?<\/script\s*>/ig , '');
Maybe it will be helpful: https://github.com/meteor/meteor/pull/5015/commits/2314e90c51403e46a8dbf1405414d1a6cf99e5cf
My thoughts - regex should match any of those examples:
<script type="text/javascript">
function () {
return true;
}
</script>
<script type='text/javascript'>
function () {
return true;
}
</script>
<script type="application/javascript">
function () {
return true;
}
</script>
<script type="application/javascript" async defer>
function () {
return true;
}
</script>
<script async defer charset="utf8">
function () {
return true;
}
</script>
<script async src="http://example.com/main.js" charset="UTF-8"></script>
<script async>
function () {
return true;
}
</script>
<script>
function () {
return true;
}
</script>
And do not match next:
<script type="application/ld+json">
{bar: "foo"}
</script>
<script type="application/ld+json" async defer>
{bar: "foo"}
</script>
<script async type="application/ld+json" defer>
{bar: "foo"}
</script>
<script type="any-other/type">
{bar: "foo"}
</script>
I'm stuck with next regex:
/<script[^>]*(?=type\=("|')(application|text)\/javascript("|'))[^>]*>(.|\n|\r)*?<\/script\s*>/gi
Works correctly unless type
attribute exists. Hope someone can finish it.
Playground: https://regex101.com/r/PtlDsd/1
Please try this:
/<script([^>](?!type))*[^\s>]?(\stype\=("|')(application|text)\/javascript("|'))?([^>](?!type))*>(.|\n|\r)*?<\/script\s*>/gi
@MichalW great solution. @jazeee will you change it or would you have a PR?
Hi @MichalW ,
Thank you for contribution, merged and published as v1.3.0
Feel free to reopen it in case if issue still persists on your end.
My script ld+json does not appear in ?_escapedfragment\= mode
Is spiderable removed all script tags?