mozilla / spade

Automated scraping markup+CSS from a list of relevant URLs, using a variety of user-agent strings. Provides reporting on usage of CSS properties and apparent user-agent sniffing.
22 stars 9 forks source link

Fix all the errors because of shady redirects; Mark sites with no URLScans (mail.ru - like) as 'bad'. #28

Closed mihneadb closed 12 years ago

mihneadb commented 12 years ago

UA part seems fine now.

I think we need to get back to allowing off-site CSS because it seems we aren't really downloading too much of it now that we disabled that functionality. I guess most sites keep their CSS off the main domain?

mihneadb commented 12 years ago

Might be worth mentioning that mozilla.org for example has its CSS onsite and it gets scanned & processed nicely.

k0s commented 12 years ago

I guess most sites keep their CSS off the main domain?

Its certainly not ubiquitous but is a common tactic for various reasons (e.g. load balancing)

mihneadb commented 12 years ago

I have no idea how you quoted part of my reply!

Problem with this approach is that we don't know if the CSS is "theirs" or if it is pulled from some public source. The whole idea of the project is to get people aware of the unprefixed and moz-prefixed versions of the properties, but if they just use someone else's CSS, then they have no power of changing it and we are basically processing it "for nothing".

carljm commented 12 years ago

I have no idea how you quoted part of my reply!

If you have github email notifications turned on, you can just reply to the email notification and have it appear as a comment on the issue. So the quoting is done by the email client.

Problem with this approach is that we don't know if the CSS is "theirs" or if it is pulled from some public source. The whole idea of the project is to get people aware of the unprefixed and moz-prefixed versions of the properties, but if they just use someone else's CSS, then they have no power of changing it and we are basically processing it "for nothing".

With JS I'd think that's more common (jQuery etc); I don't think it's nearly as common to pull in truly third-party CSS (though there are some cases of course, like Bootstrap). And it is quite common to host your own custom CSS on a different domain; I'd say at least half the sites I've ever launched do that (current site I'm working on hosts all CSS and JS on Amazon S3). I'm not sure if this is still true, but at one point it was considered a page-loading-speed benefit, since browsers would only open a limited number of simultaneous connections t a given domain. All in all I think you'll miss quite a bit of "real" CSS if you eliminate off-domain CSS.

mihneadb commented 12 years ago

Well then maybe a fair compromise will be to accept off-domain CSS but keep the depth restriction?

Thanks for your input!

mihneadb commented 12 years ago

Yep, that seems to work. Adding one more commit to the pull req.