nix-community / nix-vscode-extensions

Nix expressions for VSCode and OpenVSX extensions [maintainers: @deemp, @AmeerTaweel]
MIT License
194 stars 11 forks source link

Make a monolith fetcher #10

Closed deemp closed 1 year ago

deemp commented 1 year ago

For now, we have two parts of the fetcher (high-level description):

  1. TypeScript scripts ask sites about extensions' info like name, publisher and latest upload timestamp. Then, the scripts generate a tmp.yaml with all this info.
  2. Haskell script
    1. reads this tmp.yaml and checks against another yaml - let it be cache.yaml - that contains additional info about extensions like calculated SHA. The script selects the info about extensions that are both in cache.yaml and tmp.yaml. These extensions are ready to be included into the updated version of cache.yaml. The extensions that are in tmp.yaml, but not in cache.yaml are fetched, and their SHA is calculated. Those extensions that were fetched successfully are combined with the ready extensions, they're all sorted and written into cache.yaml. As the vscode-marketplace usually returns very similar sets of extensions, it's cheap to update cache.yaml. The extensions that aren't in tmp.yaml but are in cache.yaml are not included into the updated cache.yaml. This is how we drop the old versions of extensions. The only interesting detail about the Haskell script is that this script spawns a limited number of threads that work on tmp.yaml and nix store prefetch-files so that we don't hit the request rate.
    2. After updating the cache.yaml, the script generates nix expressions from that file. Perhaps we should move this generation to Nix (TODO).

At a lower level, we work with files that are stored in data/. data/old/vscode-marketplace.yaml is like cache.yaml mentioned above.

As @spikespaz suggested (https://github.com/nix-community/nix-vscode-extensions/pull/9#issuecomment-1416486553), we can choose a language and write all our scripts in that language to improve maintainability.

spikespaz commented 1 year ago

I'd like to throw in a note, I'd like to help, but won't touch TypeScript. Haskell I would like to learn.

AmeerTaweel commented 1 year ago

@deemp I believe you are talking about ./updater/deps.ts. This is the standard way for doing imports in Deno, even for the standard library https://deno.land/std.

I understand that if https://deno.land/std is down the code won't work. However, isn't it the case for everything else? Most nix derivations fetch sources from other websites, so if the website goes down the derivation won't build. I'm not sure I get your point here.

Writing the whole thing in TypeScript shouldn't be that hard. I would like to do it once we figure out the issue you mentioned. I think TypeScript is a good choice for this project because we do a lot of networking and JSON manipulation, and (I believe) JavaScript/TypeScript is the best tool for this job.

spikespaz commented 1 year ago

I think TypeScript is a good choice for this project because we do a lot of networking and JSON manipulation, and (I believe) JavaScript/TypeScript is the best tool for this job.

Some better options:

Overall, my proposal is to use Rust with serde, serde_json, serde_cbor, isahc or surf and smol or async-std.

Obviously, I am somewhat inviting myself to work on this thing, mostly because I really like doing these sorts of projects. Scrapers and REST wrappers excite me. Feel free to ignore me and I'll stay out of your business.

I debated with myself whether to reply to this statement, for a few reasons. I am extremely biased because I really hate the direction that JavaScript has taken the software industry as a whole, as well as people falling into the trap of thinking it's the best language (but that goes for any other as well). I could write a whole article of reasons why JavaScript is just almost never any better than anything else, but will refrain from doing so. I will say one thing about it: JS is a crowbar, and you can choose from any other precision instrument. TypeScript is Microsoft's attempt to fix their mistakes with the design of JS. Please leave it on the client-frontend.

AmeerTaweel commented 1 year ago

I am extremely biased...

Totally agree.

Overall, my proposal is to use Rust with serde, serde_json, serde_cbor, isahc or surf and smol or async-std.

We have wildly different different opinions on JS/TS, but at least we both agree that Rust is a great, fast, and safe language. A re-write in Rust was inevitable anyway, so why not do it now.

Obviously, I am somewhat inviting myself to work on this thing, mostly because I really like doing these sorts of projects. Scrapers and REST wrappers excite me. Feel free to ignore me and I'll stay out of your business.

I never did a decent Rust project so I'm not that proficient in the language (yet). So I think it's better if you're the one to re-write in Rust.

What do you think @deemp?

deemp commented 1 year ago

I'm biased, too, but towards Haskell. That's why, I suggest to use Haskell in our project.

Haskell is a language that's usually extremely pleasant to work with. It features good libraries, nice abstractions, awesome multi threading, fast compilation. Yet Haskell is pretty unpopular. There are a lot of useful Node and RIIR apps, but not many popular Haskell apps. To name a few, real heroes are hadolint and shellcheck. So, I'd like to promote Haskell by using it in our project. Hopefully, more people will recognize that Haskell is a reliable and mature language that's is a good choice for accomplishing non-trivial tasks.

To back up my suggestion, I rewrote the entire thing in Haskell. It works twice faster than the previous version (see the action), has logs enabled, and handles various errors gracefully. E.g., when running locally, I can hit Ctrl+C, and the current progress (fetched extensions' info) will get appended to the file with the cached extensions' info. Moreover, to make the code readable, I commented it pretty thoroughly and can leave more comments if necessary.

The structure of the data has changed a bit. Now, there are only .json files that are converted to Nix expressions inside flake.nix. This allowed for reducing the cache size down to ~15M. I successfully tried building several extensions, e.g., via:

nix flake lock github:nix-community/nix-vscode-extensions/rewrite-in-haskell
nix-repl> :lf github:nix-community/nix-vscode-extensions/rewrite-in-haskell
nix-repl> :b extensions.x86_64-linux.vscode-marketplace.rust-lang.rust-analyzer

Now, @AmeerTaweel , @spikespaz , do you agree with my suggestion to use Haskell here?

spikespaz commented 1 year ago

I do agree that Haskell is one good option. However, despite my willingness to experiment with unorthodox languages, you do bar others from working on this because of the high learning curve, I would argue that that of Haskell is greater than the degree of Rust.

Aside, I had actually already written API wrappers (mostly complete) for Open VSX Registry. I thought that in doing so, I would be contributing back to the wider community, as this is a publishable and useful crate for others. I may continue with the work, but must admit, am relieved to be relieved of the duty that I had over-eagerly committed myself to.

I recognize that there is little other benefit however; the binary here is very specific to this particular project and therefore of little consequence to the rest of the world. This disappointing fact is compounded by the fact that Micro$oft does not permit usage of the Marketplace API, and that any utility of it is strictly against their terms (unless authorized, such as for this project). In addition, while writing the wrapper for Open VSX was relatively easy, it was only easy because they have a strongly-typed and well documented API--such is not the case for the proprietary Marketplace API.

Considering that you've already done the work, more than I have at least, I think you should continue with Haskell. I did expect this to be a race anyway. I will continue with my Rust wrappers in the future when I have further personal need for them.

Just a comment about Haskell (and lambda-calculus-based languages in general), it is significantly more difficult for a novice or even intermediate programmer to learn FP than it is to wrap one's head around imperative paradigms. I don't think Haskell is a good fit for widespread software development; it is certainly a capable language, and functional solutions make one feel clever with themselves, but it is not necessarily more advantageous than the other options.

AmeerTaweel commented 1 year ago

However, despite my willingness to experiment with unorthodox languages, you do bar others from working on this because of the high learning curve, I would argue that that of Haskell is greater than the degree of Rust.

I agree with @spikespaz that using Haskell will make the project less approachable. I also don't know that much Haskell. I played with XMonad in the past, but nothing fancy. And I also never used it for a project.

So, I'd like to promote Haskell by using it in our project. Hopefully, more people will recognize that Haskell is a reliable and mature language that's is a good choice for accomplishing non-trivial tasks.

@deemp, I like the idea so I agree with continuing using Haskell.

However...

I tried to read some of your code in /hs/app/Main.hs, and I think it needs some readability improvements. For example:

The TypeScript part is very short so I also think re-writing it in Haskell would not increase the program size that much. But I believe that we should do some refactoring on the Haskell code: splitting into multiple files, smaller functions, etc...

deemp commented 1 year ago

Thank you for your answers!

@AmeerTaweel, the TypeScript part is already there.

I tried to read some of your code in /hs/app/Main.hs, and I think it needs some readability improvements

Sure, I'll refactor the code and comments.

AmeerTaweel commented 1 year ago

@deemp great.

Is it safe to delete the TypeScript directory now?

There is also some Python code in the repo under scripts. What is the state of this? Is it safe to delete?

deemp commented 1 year ago

I removed scripts.

deemp commented 1 year ago

By the way, if you'd like to study Haskell, I recommend the Haskell for Imperative Programmers playlist :) Following that, you may want to read books like LYAH, Get Programming with Haskell, and Haskell in Depth.