tobie / specref

An open-source, community-maintained database of Web standards & related references.
http://www.specref.org/
Apache License 2.0
162 stars 142 forks source link

The spec short names should be case-insensitive #22

Closed lanthaler closed 10 years ago

lanthaler commented 11 years ago

A reference like RDF11-MT in ReSpec currently doesn't work as the biblio DB uses rdf11-mt as index and the service at jitsu is case-sensitive. Consequently http://specref.jit.su/bibrefs?refs=RDF11-MT returns no data.

tobie commented 11 years ago

@darobin, @marcoscaceres, thoughts?

marcoscaceres commented 11 years ago

Traditionally, we have made it so all references must much exactly. I think we should continue to do that (i.e., we should just fix the biblio for this one entry or the spec that incorrectly references rdf11-mt should just be fixed).

My conclusion is that this bug is invalid.

marcoscaceres commented 11 years ago

s/conclusion/opinion.

lanthaler commented 11 years ago

So you anticipate a case where there are two different specs using the same short name but only differ in capitalization? I would consider that as a serious problem.

marcoscaceres commented 11 years ago

No, that would never happen. I'm a bit confused about the problem here, maybe? I see the the identifier we use in the bilbio.js as only coincidentally related to the TR (or the URL conventions used by any standards organization).

marcoscaceres commented 11 years ago

Argh... lets be clear on terminology - the "short name" of a spec is something we talk about at the W3C.... but I think you mean just the ids, which is what was confusing me.

So yes, it would be a (social) problem if we had "DOM" and "dom". But this has not been a problem in practice.

halindrome commented 11 years ago

I think that all of the IDs that are used in embedded spec references (e.g., [[DOM]]) should be in upper case. Are there mixed case ones now? Historically I feel like they have always been all upper case.

On Tue, Jul 9, 2013 at 5:22 PM, Marcos Caceres notifications@github.comwrote:

Argh... lets be clear on terminology - the "short name" of a spec is something we talk about at the W3C.... but I think you mean just the ids, which is what was confusing me.

So yes, it would be a (social) problem if we had "DOM" and "dom". But this has not been a problem in practice.

— Reply to this email directly or view it on GitHubhttps://github.com/tobie/specref/issues/22#issuecomment-20709810 .

Shane McCarron halindrome@gmail.com

marcoscaceres commented 11 years ago

Yeah, I thought the same, but checking there are quite a few mixed case ones.

On Tuesday, July 9, 2013, Shane McCarron wrote:

I think that all of the IDs that are used in embedded spec references (e.g., [[DOM]]) should be in upper case. Are there mixed case ones now? Historically I feel like they have always been all upper case.

On Tue, Jul 9, 2013 at 5:22 PM, Marcos Caceres <notifications@github.com<javascript:_e({}, 'cvml', 'notifications@github.com');>>wrote:

Argh... lets be clear on terminology - the "short name" of a spec is something we talk about at the W3C.... but I think you mean just the ids, which is what was confusing me.

So yes, it would be a (social) problem if we had "DOM" and "dom". But this has not been a problem in practice.

— Reply to this email directly or view it on GitHub< https://github.com/tobie/specref/issues/22#issuecomment-20709810> .

Shane McCarron halindrome@gmail.com <javascript:_e({}, 'cvml', 'halindrome@gmail.com');>

— Reply to this email directly or view it on GitHubhttps://github.com/tobie/specref/issues/22#issuecomment-20713276 .

Marcos Caceres http://datadriven.com.au

tobie commented 11 years ago

I introduced mixed-cased references after adding support for automatically pulling in /tr references. We discussed this with @darobin back then and agreed this was the way to go, notably to be able to use these references outside of Respec (e.g. for the testing effort).

We unfortunately now have cases where the uppercase and lowercase versions differ, e.g.: DOM vs dom.

darobin commented 11 years ago

I can't say I have a strong opinion so long as existing content keeps working.

Can we sort out the few conflicting cases and make it case-insensitive?

tobie commented 11 years ago

Can we sort out the few conflicting cases and make it case-insensitive?

WFM. Any volunteers?

marcoscaceres commented 11 years ago

Ah, sh*t - @lanthaler is right (:bow:). There are 560 duplicates.

I think most are aliases - with case insensitive matching, all of the following go away:

ATAG-wombat, C14N-issues, CCPP-ra, CCPP-struct-vocab2, CCPP-ta, CCPP-trust, CDFReqs, CDRReqs, CSS-potential, Content-in-RDF, DDR-Simple-API, DDR-requirements, DOM-Level-1, DOM-Level-2-Core, DOM-Level-2-Events, DOM-Level-2-HTML, DOM-Level-2-Style, DOM-Level-2-Traversal-Range, DOM-Level-2-Views, DOM-Level-3-AS, DOM-Level-3-Core, DOM-Level-3-Events, DOM-Level-3-LS, DOM-Level-3-Val, DOM-Level-3-Views, DOM-Level-3-XPath, DOM-Requirements, DSig-label, DSig-usage, EARL10-Requirements, EARL10-Schema, EC-related-activities, EMMAreqs, ElementTraversal, HTTP-NG-testbed, HTTP-in-RDF, IndexedDB, InkML, MathML, MathML2, MathML3, MediaAccessEvents, P3P-preferences, P3P10-Protocols, P3P10-principles, PICS-labels, PICS-services, PICSRules, Pointers-in-RDF, S6Group2, SVG2Reqs, SVGFilter12, SVGFilterPrimer12, SVGFilterReqs12, SVGMobile, SVGMobile12, SVGMobileReqs, SVGPrint12, SVGPrintPrimer12, SVGPrintReqs, SVGReq, SVGTiny12, SVGTiny12Reqs, SYMM-modules, TVWeb-URI-Requirements, UAAG20-requirements, WICDFull, WebCGM, WebGL, WebIDL, Window, XForms-for-HTML, XHTMLplusMathMLplusSVG, XHTMLplusSMIL, XMLHttpRequest, XMLHttpRequest2, XSLReq, access-control, acdi, acss, animation-timing, app-uri, arabic-math, backplane, becss, call-control-reqs, ccxml, charmod, charmod-norm, charmod-resid, charreq, clipboard-apis, compositing, contacts-api, cooluris, cors, cpc-req, cselection, cselection-primer, cselection-xaf, css-masking, css-mobile, css-print, css-tv, css3-2d-transforms, css3-3d-transforms, css3-animations, css3-fonts, css3-hyperlinks, css3-images, css3-marquee, css3-mediaqueries, css3-preslev, css3-reader, css3-transforms, css3-transitions, css3-webfonts, css4-images, cssom, cssom-view, ct-guidelines, ct-landscape, curie, dap-privacy-reqs, dcontology, dd-ecosystem, dd-landscape, dd-structures, ddr-core-vocabulary, dfaui, di-atdi, di-dco, di-gloss, di-princ, dial, dial-primer, dom, egov-improving, emma, exi, exi-best-practices, exi-evaluation, exi-impacts, exi-measurements, exi-primer, file-upload, filter-effects, fullscreen, geolocation-API, grddl, grddl-primer, grddl-scenarios, grddl-tests, hash-in-uri, hcls-kb, hcls-senselab, hlink, html, html-design-principles, html-lan, html-rdfa, html40, html40-mobile, html401, html5, html5-diff, html5-pubnotes, i18n-guide-framework, i18n-html-tech-bidi, i18n-html-tech-char, i18n-html-tech-lang, ilu-requestor, inkreqs, its, itsreq, jlreq, json-ld, json-ld-api, lbase, leiri, lexicon-reqs, ltli, mathml-bvar, mathml-for-css, mathml-types, mathml-units, media-annot-reqs, media-frags, messaging-api, microdata, microdata-rdf, mmi-arch, mmi-auth, mmi-dev-feedback, mmi-framework, mmi-reqs, mmi-suggestions, mmi-use-cases, mobile-bp, mobile-bp-scope, mobileOK, mobileOK-basic10-tests, modality-interface, multimodal-reqs, mwabp, mwbp-wcag, namespaceState, navigation-timing, ngram-spec, nl-spec, offline-webapps, owl-features, owl-guide, owl-parsing, owl-ref, owl-semantics, owl-test, owl-time, owl-xmlsyntax, owl2-conformance, owl2-direct-semantics, owl2-manchester-syntax, owl2-mapping-to-rdf, owl2-new-features, owl2-overview, owl2-primer, owl2-profiles, owl2-quick-reference, owl2-rdf-based-semantics, owl2-syntax, owl2-xml-serialization, p3p-rdfschema, p3pdeployment, page-visibility, positioning, powder-dr, powder-formal, powder-grouping, powder-primer, powder-test, powder-use-cases, powder-voc, powder-xsd, print, proc-model-req, progress-events, pronunciation-lexicon, qa-handbook, qaframe-ops-extech, qaframe-spec, qaframe-test, quota-api, rdf-concepts, rdf-dawg-uc, rdf-mt, rdf-primer, rdf-schema, rdf-sparql-XMLres, rdf-sparql-json-res, rdf-sparql-protocol, rdf-sparql-query, rdf-syntax-grammar, rdf-testcases, rdf-uml, rdfa-core, rdfa-lite, rdfa-primer, rdfa-syntax, rdfcal, rdftm-survey, reusable-dialog-reqs, rex, rex-reqs, rif-bld, rif-core, rif-dtb, rif-fld, rif-overview, rif-prd, rif-rdf-owl, rif-test, rif-ucr, role-attribute, ruby, sXBL, sawsdl, sawsdl-guide, schema-arch, screen-orientation, scxml, selectors-api, selectors-api2, semantic-interpretation, shadow-dom, skos-primer, skos-reference, skos-ucr, smil, smil-animation, smil20, sml, sml-if, soap11-ror-httpbinding, soap12-af, soap12-email, soap12-mtom, soap12-mtom-policy, soap12-n11n, soap12-os-ucr, soap12-part0, soap12-part1, soap12-part2, soap12-part3, soap12-rep, soap12-testcollection, soapjms, spec-variability, speech-grammar, speech-synthesis, speech-synthesis11, sprot11, ssml-sayas, ssml11reqs, streams-api, sw-oosd-primer, swbp-classes-as-values, swbp-n-aryRelations, swbp-skos-core-guide, swbp-skos-core-spec, swbp-specified-values, swbp-thesaurus-pubguide, swbp-vocab-pub, swbp-xsch-datatypes, test-metadata, timesheets, timezone, touch-events, tracking-compliance, tracking-dnt, ttaf1-dfxp, ttaf1-req, turingtest, turtle, unicode-xml, uri-clarification, url, vbi-reqs, voice, voice-architecture, voice-dialog-reqs, voice-grammar-reqs, voice-intro, voice-nlu-reqs, voice-tts-reqs, voicexml20, voicexml21, voicexml30, vxml30reqs, wai-age-literature, wai-aria, wai-aria-implementation, wai-aria-practices, wai-aria-primer, wai-aria-roadmap, wcag2-req, wcag2-tech-req, web-forms-2, webarch, webcgm20, webcgm21, webgl, webont-req, webstorage, widgets, widgets-apis, widgets-digsig, widgets-land, widgets-reqs, widgets-updates, widgets-uri, wordnet-rdf, ws-addr-core, ws-addr-metadata, ws-addr-soap, ws-arch, ws-arch-scenarios, ws-cdl-10, ws-cdl-10-primer, ws-chor-model, ws-chor-reqs, ws-desc-reqs, ws-desc-usecases, ws-enumeration, ws-eventing, ws-fragment, ws-gloss, ws-i18n, ws-i18n-req, ws-i18n-scenarios, ws-metadata-exchange, ws-policy, ws-policy-attach, ws-policy-guidelines, ws-policy-primer, ws-resource-transfer, ws-transfer, wsa-reqs, wsc-threats, wsc-ui, wsc-usecases, wsc-xit, wsdl11elementidentifiers, wsdl20, wsdl20-additional-meps, wsdl20-adjuncts, wsdl20-altschemalangs, wsdl20-primer, wsdl20-rdf, wsdl20-soap11-binding, wslc, xag, xbc-characterization, xbc-measurement, xbc-properties, xbc-use-cases, xbl, xbl-primer, xforms-11-req, xforms-basic, xforms11, xframes, xh, xhtml-access, xhtml-basic, xhtml-forms-req, xhtml-media-types, xhtml-modularization, xhtml-print, xhtml-prof-req, xhtml-rdfa, xhtml-rdfa-primer, xhtml-rdfa-scenarios, xhtml-roadmap, xhtml-role, xhtml1-schema, xhtml11, xhtml2, xinclude, xkms-pgp, xkms-wsdl, xkms2, xkms2-bindings, xkms2-req, xlink-principles, xlink-req, xlink10-ext, xlink11, xlink2rdf, xml-blueberry-req, xml-c14n, xml-c14n11, xml-canonical-req, xml-encryption-req, xml-entity-names, xml-events, xml-exc-c14n, xml-fragid, xml-fragment, xml-i18n-bp, xml-id, xml-id-req, xml-infoset, xml-infoset-rdfs, xml-infoset-req, xml-link-style, xml-media-types, xml-names, xml-names11, xml-names11-req, xml-schema-req, xml-stylesheet, xml11, xml11schema10, xmlbase, xmldsig-bestpractices, xmldsig-core, xmldsig-core1, xmldsig-core1-interop, xmldsig-core2, xmldsig-properties, xmldsig-requirements, xmldsig-simplify, xmldsig-xpath, xmldsig2ed-tests, xmlenc-core, xmlenc-core1, xmlenc-core1-interop, xmlenc-decrypt, xmlp-reqs, xmlp-scenarios, xmlschema-0, xmlschema-1, xmlschema-11-req, xmlschema-2, xmlschema-formal, xmlschema-guide2versioning, xmlschema-patterns, xmlschema-patterns-advanced, xmlschema-ref, xmlschema11-1, xmlschema11-2, xmlsec-algorithms, xmlsec-derivedkeys, xop10, xopinc-FAQ, xpath, xpath-datamodel, xpath-full-text-10, xpath-full-text-10-requirements, xpath-full-text-10-use-cases, xpath-functions, xpath20, xpath20req, xproc, xproc-requirements, xptr-element, xptr-framework, xptr-infoset-liaison, xptr-req, xptr-xmlns, xptr-xpointer, xquery, xquery-11, xquery-11-requirements, xquery-11-use-cases, xquery-30, xquery-30-requirements, xquery-30-use-cases, xquery-requirements, xquery-semantics, xquery-sx-10, xquery-sx-10-requirements, xquery-sx-10-use-cases, xquery-update-10, xquery-update-10-requirements, xquery-update-10-use-cases, xquery-use-cases, xquery-xpath-parsing, xqueryx, xqueryx-11, xqupdateusecases, xsl11, xsl11-req, xslfo20-req, xslt, xslt-xquery-serialization, xslt11, xslt11req, xslt20, xslt20req
lanthaler commented 11 years ago

Looks like all of those (or at least most) come from @tobie’s import. I’m pretty sure, e.g., that “json-ld” (lowercase) isn’t used anywhere but as for most other specs, the /tr URL contains the lowercase “json-ld” short name. @marcoscaceres, can you confirm that by looking at the source property?

marcoscaceres commented 11 years ago

@lanthaler yes, all short names for are in lower case. I've also updated the list above to show the actual duplicate ids as they appear in the biblio.js file.

This is the simple script I used:

var dups = [],
    set = new Set();  
for(var i in biblio){
    var entry = i.toUpperCase();
    if(!set.has(entry)){
        set.add(entry);
        continue;
    }
    dups.push(i);
}
marcoscaceres commented 11 years ago

There are a number of ways of doing this... and I'm not sure what the best way is... one is to remove redundancy (most of the duplicate aliasOf entries), canonicalize all the keys to "UPPERCASE" (to allow fast lookup), and update the Web service code to convert query params to uppercase in the comparison. I don't know how the tests we currently have work, but I'm sure those will break too.

lanthaler commented 11 years ago

There are a number of ways of doing this... and I'm not sure what the best way is... one is to remove redundancy (most of the duplicate aliasOf entries), canonicalize all the keys to "UPPERCASE" (to allow fast lookup), and update the Web service code to convert query params to uppercase in the comparison.

Yeah, that's exactly what I initially suggested to Tobie on Twitter as well.

tobie commented 11 years ago

How the data is stored is orthogonal to whether queries are case sensitive or not. For various reasons, I'd rather keep W3C-related keys exactly as the short names.

In effect, we'd still need to remove duplicates and/or consolidate data where necessary (e.g. DOM and dom).

Happy to merge a patch that does this.

marcoscaceres commented 11 years ago

@tobie ok, let's deal with the cleanup first (DOM and dom is probably my fault).

marcoscaceres commented 11 years ago

Ok, so there was only 3 (according to code below):

HTML, DOM, WebGL

var dups = [],
    set = new Set();  
for(var i in biblio){
    var entry = i.toUpperCase();
    if(!set.has(entry)){
        set.add(entry);
        continue;
    }else{
        if(!!biblio[i].aliasOf){console.log(entry)}
    }
    dups.push(i);
}
lanthaler commented 10 years ago

OK, took me a while but here's finally a PR which cleans up the db and makes the queries case-insensitive.

I have a couple of specs that suffer from this issue so it would be great if you could merge this PR or whether there's anything you want to see to be changed before merging it.