Closed Spiker985 closed 1 year ago
One thing we could do is look directly to the source to update our data. It would require a rewrite of how we update the auto-completions.
But focusing on CSS we can see W3 does list a JSON return for every support property.
This may be more difficult to build a super useful autocomplete out of, but likely would be our best bet especially if we don't have much concern for snippet support.
Funnily enough I've very recently done my own autocomplete package where I grabbed the JSON data directly from NodeJS to build the package. So if anyone takes a shot at rewriting the update script that is available as a reference if needed. Or otherwise I'd be more than happy to give this a crack in the next bit.
hmm, not sure if this would be useful, but there are MDN content/data repos and stuff on github, not sure if reasonable/feasible/reliable to pull stuff from there?
https://github.com/mdn/content https://github.com/mdn/data https://github.com/mdn
I would suggest going the MDN route. It's not quite the same thing, but when I tried to modernize TextMate's JS bundle, I wrote a Ruby script to screen-scrape MDN just to know what sorts of tokens to recognize as Web APIs. I haven't tried to run it since then, but I imagine that MDN data is structured enough that the same approach would work today.
* [All Properties JSON](https://www.w3.org/Style/CSS/all-properties.en.json) * [All Descriptors JSON](https://www.w3.org/Style/CSS/all-descriptors.en.json)
W3C indicates the maturity of specifications by a status code. The CSS working group uses the following, from least to most stable:
Abbreviation | Full name |
---|---|
FPWD | First Public Working Draft |
WD | Working Draft |
CR | Candidate Recommendation |
CRD | Candidate Recommendation Draft |
PR | Proposed Recommendation |
REC | Recommendation |
SPSD | Superseded Recommendation |
The names are defined in section 6 of the W3C process document. A REC is what is normally referred to as a ‘standard.’ W3C encourages everyday use starting from CR.
from https://www.w3.org/Style/CSS/current-work.en.tmpl
To demystify the "status" property from those json endpoints
W3 also follows conventions to ensure automated processing is easier
oh! this may be interesting (MDN says theyre deprecating mdn/data in favour of this): https://github.com/w3c/webref
This repository contains machine-readable references of CSS properties, definitions, IDL, and other useful terms that can be automatically extracted from web browser specifications.
Also have a few packages for it (copypasted from the readme):
oh! this may be interesting (MDN says theyre deprecating mdn/data in favour of this): w3c/webref
This repository contains machine-readable references of CSS properties, definitions, IDL, and other useful terms that can be automatically extracted from web browser specifications.
Also have a few packages for it (copypasted from the readme):
* [@webref/idl](https://www.npmjs.com/package/@webref/idl) contains a [curated](https://github.com/w3c/webref/blob/main/packages/idl?rgh-link-date=2023-02-17T19%3A17%3A45Z#guarantees) version of the [ed/idl](https://github.com/w3c/webref/blob/main/ed/idl?rgh-link-date=2023-02-17T19%3A17%3A45Z) folder. * [@webref/css](https://www.npmjs.com/package/@webref/css) contains a [curated](https://github.com/w3c/webref/blob/main/packages/css?rgh-link-date=2023-02-17T19%3A17%3A45Z#guarantees) version of the [ed/css](https://github.com/w3c/webref/blob/main/ed/css?rgh-link-date=2023-02-17T19%3A17%3A45Z) folder. * [@webref/elements](https://www.npmjs.com/package/@webref/elements) contains a [curated](https://github.com/w3c/webref/blob/main/packages/elements?rgh-link-date=2023-02-17T19%3A17%3A45Z#guarantees) version of the [ed/elements](https://github.com/w3c/webref/blob/main/ed/elements?rgh-link-date=2023-02-17T19%3A17%3A45Z) folder. * [@webref/events](https://www.npmjs.com/package/@webref/events) contains a [curated](https://github.com/w3c/webref/blob/main/packages/events?rgh-link-date=2023-02-17T19%3A17%3A45Z#guarantees) version of the [ed/events](https://github.com/w3c/webref/blob/main/ed/events?rgh-link-date=2023-02-17T19%3A17%3A45Z) folder.
Ironically enough, all of this content seems to use W3 as the source, so why shouldn't we just use W3?
Edit: Oh lol, those are W3's repos
They did the manual crawling for us, and these are W3's repos, so we can trust these to the same degree
edit: heh >~<
I'm reminded about what's hard here: ideally you want something that will maintain itself without your intervention. But at the end of it, you've got a flat list of possible completions, and you have no idea which ones are most important.
If I'm beginning to type a CSS property, and I get as far as font-
, some possible completions are much more helpful than others. I would certainly want font-size
to be at the top of the list, and I would not want font-variant-east-asian
to show up at all. But to decide which ones are used enough to warrant suggesting is to exert human curation on this data set.
This is less of a problem with CSS and HTML, but a huge problem when deciding which JavaScript tokens to autocomplete, because there are so many things available in the global namespace (in both browsers and Node) and no good automated way of deciding which ones are important.
Maybe there is usage data we can pull from to use to rank suggestions? Either way, I think combining the automated way to get a list of everything, combined with some method of curation (automated or not), is best? This way, new stuff can still at least get pulled in and be on the list somewhere and we aren't falling behind there.
So to build off what @savetheclocktower is saying. I think the first goal should be just getting the list of completions. As the completions themselves (while technically autocomplete-plus
does support having a priority
tag, that isn't used here) don't dictate in what order the results will appear.
If memory serves results are shown purely based on if the first characters match. Meaning potentially font-size
, fill
would have the same priority when you type f
through font
. But this may be wrong, I'd have to do another check and may be thinking of the logic within the autocomplete-html
.
So I think it's important to have a distinction when talking about getting our autocomplete data, and actually providing it as those are distinct parts of the plugin.
But as for how we provide and rank the autocompletions, personally I'd be in a huge favour to utilize something like Levenshtein Distance or (what I think would likely work even better for this purpose) Longest Common Subsequence.
Both of these algorithms are actually what provided the search capability on the package-backend
when I first made it but had to be scrapped after switching to an SQL data store.
Either of these algorithms could handle providing and ranking the actual autocompletions for us based entirely on either
The reason I so highly recommend LCS to be used and implemented in our autocomplete packages is because that would likely be the simplest way to do exactly what @savetheclocktower mentioned. When you type font-
you probably want font-size
and not font-variant-east-asian
until proven otherwise. Since with LCS font-
scores higher on font-size
as they are more similar than font-variant-east-asian
even though they contain the same amount of similar characters.
To also explain why I rambled so much about completion providing here, is because I really would vote to rather not use 'curated' lists. I personally feel like, especially if we are able to get a full data set, then lets just use the proper full dataset. And that we shouldn't be the limited factor in someone using non-standard or non-popular aspects of a language not getting autocompletions.
Also @Meadowsys wanted to say the link you provided to w3c/webref
looks fantastic! And very well may be the best place for us to gather these autocompletions.
I'll happily try to take a shot at getting these updated this weekend.
May be worth first getting them updated, and I'll play around with better autocompletions on my package and if that works apply it here, want to make sure the performance is still good if possible.
So I'm trying to see what we can do by utilizing w3c/webref
and while this does an amazing job of letting us generate and parse the data into a valid completions file, the only issue is it does not provide any docs to do so.
Additionally while they talk of aiming it to be machine parsable with ease, there are some very large variations in the format of the data between the properties
, atrules
, and selectors
. So I think this provides us the most complete set of data that is valid in CSS it may not be perfect for our use case. But I do wonder if there is any way we could essentially use an API to MDN to be able to grab the documentation side of things
Alright, so now that I've completed the autocomplete-css
package, it's time to tackle the autocomplete-html
package.
And honestly I've been looking around for some time, but can't locate a good source of machine readable specs for HTML. At least nothing that provides elements and their attributes.
For example the actual spec of HTML is located here which is a proprietary implementation of code that is parsed to become HTML and updates the published spec on their website. So short of parsing that it's useless to us.
Then the webref
resources posted above do have a list of all elements, but with no mention of their attributes. And the MDN resources posted above do have all of the above, but only as MarkDown, so we would have to build a pretty complex parser to extract everything we need from there.
We might have to get lucky and find an implementation of this data being created by a third party, but we would probably want to rely on one created by the big players, like directly from Google for Chrome, or Firefox. Since I've found many implementations that haven't been updated in years from other developers, but I don't want to repeat what we are currently doing relying on some random file from Adobe.
Thanks in advance for your bug report!
What happened?
autocomplete-CSS and autocomplete-html both look at the archived
adobe/brackets
repos for their tags, and attributes.Both also look at
https://developer.mozilla.org/en-US/search.json
to determine additional metadata regarding a tag, such as description - however, this endpoint is no longer functioning as previous, and does not return json search resultsAs a result, our html and css tag options are wildly out of date (~6 years for css, and the
adobe/brackets
are 3 years)We should presumably update the entire process for gathering new information, and (provided the new gathering can be styled in the old standard) ignore the existing, outside of pointers of where to gather information from.
Pulsar version
Any
Which OS does this happen on?
❓ Other(Please specify in the OS details field below)
OS details
Any
Which CPU architecture are you running this on?
64-bit(x86_64)
What steps are needed to reproduce this?
This was noticed because of discord users
mobilex1122
andmiga
.Specifically
translate
is not showcased as an auto-complete suggestion, seen here:Additional Information:
No response