Replace autocomplete-css and autocomplete-html update paths

Spiker985 commented 1 year ago

Thanks in advance for your bug report!

[X] Have you reproduced issue in safe mode?
[X] Have you used the debugging guide to try to resolve the issue?
[X] Have you checked our FAQs to make sure your question isn't answered there?
[X] Have you checked to make sure your issue does not already exist?
[X] Have you checked you are on the latest release of Pulsar?

What happened?

autocomplete-CSS and autocomplete-html both look at the archived adobe/brackets repos for their tags, and attributes.

Both also look at https://developer.mozilla.org/en-US/search.json to determine additional metadata regarding a tag, such as description - however, this endpoint is no longer functioning as previous, and does not return json search results

As a result, our html and css tag options are wildly out of date (~6 years for css, and the adobe/brackets are 3 years)

We should presumably update the entire process for gathering new information, and (provided the new gathering can be styled in the old standard) ignore the existing, outside of pointers of where to gather information from.

Pulsar version

Any

Which OS does this happen on?

❓ Other(Please specify in the OS details field below)

OS details

Any

Which CPU architecture are you running this on?

64-bit(x86_64)

What steps are needed to reproduce this?

This was noticed because of discord users mobilex1122 and miga.

Specifically translate is not showcased as an auto-complete suggestion, seen here:

Additional Information:

No response

confused-Techie commented 1 year ago

One thing we could do is look directly to the source to update our data. It would require a rewrite of how we update the auto-completions.

But focusing on CSS we can see W3 does list a JSON return for every support property.

This may be more difficult to build a super useful autocomplete out of, but likely would be our best bet especially if we don't have much concern for snippet support.

Funnily enough I've very recently done my own autocomplete package where I grabbed the JSON data directly from NodeJS to build the package. So if anyone takes a shot at rewriting the update script that is available as a reference if needed. Or otherwise I'd be more than happy to give this a crack in the next bit.

meadowsys commented 1 year ago

hmm, not sure if this would be useful, but there are MDN content/data repos and stuff on github, not sure if reasonable/feasible/reliable to pull stuff from there?

https://github.com/mdn/content https://github.com/mdn/data https://github.com/mdn

savetheclocktower commented 1 year ago

I would suggest going the MDN route. It's not quite the same thing, but when I tried to modernize TextMate's JS bundle, I wrote a Ruby script to screen-scrape MDN just to know what sorts of tokens to recognize as Web APIs. I haven't tried to run it since then, but I imagine that MDN data is structured enough that the same approach would work today.

Spiker985 commented 1 year ago

* [All Properties JSON](https://www.w3.org/Style/CSS/all-properties.en.json)
* [All Descriptors JSON](https://www.w3.org/Style/CSS/all-descriptors.en.json)

W3C indicates the maturity of specifications by a status code. The CSS working group uses the following, from least to most stable:

Abbreviation	Full name
FPWD	First Public Working Draft
WD	Working Draft
CR	Candidate Recommendation
CRD	Candidate Recommendation Draft
PR	Proposed Recommendation
REC	Recommendation
SPSD	Superseded Recommendation

The names are defined in section 6 of the W3C process document. A REC is what is normally referred to as a ‘standard.’ W3C encourages everyday use starting from CR.

from https://www.w3.org/Style/CSS/current-work.en.tmpl

To demystify the "status" property from those json endpoints

W3 also follows conventions to ensure automated processing is easier

meadowsys commented 1 year ago

oh! this may be interesting (MDN says theyre deprecating mdn/data in favour of this): https://github.com/w3c/webref

This repository contains machine-readable references of CSS properties, definitions, IDL, and other useful terms that can be automatically extracted from web browser specifications.

Also have a few packages for it (copypasted from the readme):

@webref/idl contains a curated version of the ed/idl folder.
@webref/css contains a curated version of the ed/css folder.
@webref/elements contains a curated version of the ed/elements folder.
@webref/events contains a curated version of the ed/events folder.

Spiker985 commented 1 year ago

oh! this may be interesting (MDN says theyre deprecating mdn/data in favour of this): w3c/webref

This repository contains machine-readable references of CSS properties, definitions, IDL, and other useful terms that can be automatically extracted from web browser specifications.

Also have a few packages for it (copypasted from the readme):

* [@webref/idl](https://www.npmjs.com/package/@webref/idl) contains a [curated](https://github.com/w3c/webref/blob/main/packages/idl?rgh-link-date=2023-02-17T19%3A17%3A45Z#guarantees) version of the [ed/idl](https://github.com/w3c/webref/blob/main/ed/idl?rgh-link-date=2023-02-17T19%3A17%3A45Z) folder.

* [@webref/css](https://www.npmjs.com/package/@webref/css) contains a [curated](https://github.com/w3c/webref/blob/main/packages/css?rgh-link-date=2023-02-17T19%3A17%3A45Z#guarantees) version of the [ed/css](https://github.com/w3c/webref/blob/main/ed/css?rgh-link-date=2023-02-17T19%3A17%3A45Z) folder.

* [@webref/elements](https://www.npmjs.com/package/@webref/elements) contains a [curated](https://github.com/w3c/webref/blob/main/packages/elements?rgh-link-date=2023-02-17T19%3A17%3A45Z#guarantees) version of the [ed/elements](https://github.com/w3c/webref/blob/main/ed/elements?rgh-link-date=2023-02-17T19%3A17%3A45Z) folder.

* [@webref/events](https://www.npmjs.com/package/@webref/events) contains a [curated](https://github.com/w3c/webref/blob/main/packages/events?rgh-link-date=2023-02-17T19%3A17%3A45Z#guarantees) version of the [ed/events](https://github.com/w3c/webref/blob/main/ed/events?rgh-link-date=2023-02-17T19%3A17%3A45Z) folder.

Ironically enough, all of this content seems to use W3 as the source, so why shouldn't we just use W3?

Edit: Oh lol, those are W3's repos

meadowsys commented 1 year ago

They did the manual crawling for us, and these are W3's repos, so we can trust these to the same degree

edit: heh >~<

savetheclocktower commented 1 year ago

I'm reminded about what's hard here: ideally you want something that will maintain itself without your intervention. But at the end of it, you've got a flat list of possible completions, and you have no idea which ones are most important.

If I'm beginning to type a CSS property, and I get as far as font-, some possible completions are much more helpful than others. I would certainly want font-size to be at the top of the list, and I would not want font-variant-east-asian to show up at all. But to decide which ones are used enough to warrant suggesting is to exert human curation on this data set.

This is less of a problem with CSS and HTML, but a huge problem when deciding which JavaScript tokens to autocomplete, because there are so many things available in the global namespace (in both browsers and Node) and no good automated way of deciding which ones are important.

meadowsys commented 1 year ago

Maybe there is usage data we can pull from to use to rank suggestions? Either way, I think combining the automated way to get a list of everything, combined with some method of curation (automated or not), is best? This way, new stuff can still at least get pulled in and be on the list somewhere and we aren't falling behind there.

confused-Techie commented 1 year ago

So to build off what @savetheclocktower is saying. I think the first goal should be just getting the list of completions. As the completions themselves (while technically autocomplete-plus does support having a priority tag, that isn't used here) don't dictate in what order the results will appear.

If memory serves results are shown purely based on if the first characters match. Meaning potentially font-size, fill would have the same priority when you type f through font. But this may be wrong, I'd have to do another check and may be thinking of the logic within the autocomplete-html.

So I think it's important to have a distinction when talking about getting our autocomplete data, and actually providing it as those are distinct parts of the plugin.

But as for how we provide and rank the autocompletions, personally I'd be in a huge favour to utilize something like Levenshtein Distance or (what I think would likely work even better for this purpose) Longest Common Subsequence.

Both of these algorithms are actually what provided the search capability on the package-backend when I first made it but had to be scrapped after switching to an SQL data store.

Either of these algorithms could handle providing and ranking the actual autocompletions for us based entirely on either

How close is the user to typing a certain completion, accounting for spelling mistakes or other typing errors (Levenshtein's Distance)
How similar is what the user typed as a whole to a certain block of text (Longest Common Subsequence)

The reason I so highly recommend LCS to be used and implemented in our autocomplete packages is because that would likely be the simplest way to do exactly what @savetheclocktower mentioned. When you type font- you probably want font-size and not font-variant-east-asian until proven otherwise. Since with LCS font- scores higher on font-size as they are more similar than font-variant-east-asian even though they contain the same amount of similar characters.

To also explain why I rambled so much about completion providing here, is because I really would vote to rather not use 'curated' lists. I personally feel like, especially if we are able to get a full data set, then lets just use the proper full dataset. And that we shouldn't be the limited factor in someone using non-standard or non-popular aspects of a language not getting autocompletions.

confused-Techie commented 1 year ago

Also @Meadowsys wanted to say the link you provided to w3c/webref looks fantastic! And very well may be the best place for us to gather these autocompletions.

I'll happily try to take a shot at getting these updated this weekend.

May be worth first getting them updated, and I'll play around with better autocompletions on my package and if that works apply it here, want to make sure the performance is still good if possible.

confused-Techie commented 1 year ago

So I'm trying to see what we can do by utilizing w3c/webref and while this does an amazing job of letting us generate and parse the data into a valid completions file, the only issue is it does not provide any docs to do so.

Additionally while they talk of aiming it to be machine parsable with ease, there are some very large variations in the format of the data between the properties, atrules, and selectors. So I think this provides us the most complete set of data that is valid in CSS it may not be perfect for our use case. But I do wonder if there is any way we could essentially use an API to MDN to be able to grab the documentation side of things

confused-Techie commented 1 year ago

Alright, so now that I've completed the autocomplete-css package, it's time to tackle the autocomplete-html package.

And honestly I've been looking around for some time, but can't locate a good source of machine readable specs for HTML. At least nothing that provides elements and their attributes.

For example the actual spec of HTML is located here which is a proprietary implementation of code that is parsed to become HTML and updates the published spec on their website. So short of parsing that it's useless to us.

Then the webref resources posted above do have a list of all elements, but with no mention of their attributes. And the MDN resources posted above do have all of the above, but only as MarkDown, so we would have to build a pretty complex parser to extract everything we need from there.

We might have to get lucky and find an implementation of this data being created by a third party, but we would probably want to rely on one created by the big players, like directly from Google for Chrome, or Firefox. Since I've found many implementations that haven't been updated in years from other developers, but I don't want to repeat what we are currently doing relying on some random file from Adobe.

pulsar-edit / pulsar