rdkit / rdkit-js

A powerful cheminformatics and molecule rendering toolbelt for JavaScript, powered by RDKit .
https://rdkitjs.com
BSD 3-Clause "New" or "Revised" License
141 stars 36 forks source link

npm i @rdkit/rdkit, future directions #389

Open MichelML opened 11 months ago

MichelML commented 11 months ago

npm i @rdkit/rdkit, future directions

Preparing slides for RDKit UGM 2023 got me thinking more seriously about this project's future directions. While there was not a lot of hands raised when I asked the audience who used the package in the past, the number of web/js talks + hackathon interest (without mentionning npm stats, issues created, and feature requests made on GH) mentionning the use of rdkit-js made it clear that there is a demand in using rdkit on the web, and that rdkit-js is usually the first thing people try to make this a reality.

The following Epic lays out what I think are the anticipated next steps for the project across different themes, which aim to improve various facets of the project: usability, usefulness/functionalities, robustness, and good DX. This issue (or related GH issues that will be logged soon) may evolve, but here are the current themes in no particular order:

CI

Context: rdkit-js is a web assembly module (called the MinimalLib) at its core, which exposes functionality of the main RDKit project. Currently, when changes occur in the main rdkit project, there is no guarantee that these changes won't break a downstream build of the rdkit-js library. Solution:

Initialization

Context: Many users complained the library is hard to use in certain contexts. Namely, if you work in a modern development environment such as create react app or similar, you cannot simply import rdkit from "@rdkit/rdkit";, then use. Why this isn't possible is multi-faceted: 1) the minimallib is not built for >= es6, 2) some environments are "use strict"-enforced, and the minimallib isn't currently compiled with the "use strict" option, 3) the library being wasm based, you need to think about where to publicly serve this module for things to work, and since a lot of users do not have a web background or experiencing managing assets in a web context, this can be a barrier to entry or a reason to simply abandon using the package. Lastly, users have reported wanting mechanisms allowing to import the library offline, since currently the library is fetched on each page load (or if you're not careful, everytime you call initRDKitModule()). Solutions:

Minimallib-level functionalities

Context: There is currently functionalities either 1) compiled but not exposed in the docs, 2) not-compiled but already available as part of the minimallib build, and 3) not-compiled and not available, yet user-requested. Solutions:

Higher-level components

Context: considering we are now able to "install, then use" in various contexts, this lays out the foundation for easy to use Higher-level components both in pure JS and modern frameworks. This should also abstract some important performance considerations to keep in mind when leveraging a wasm module and making a large amount of manipulations on molecules. Solutions:

API documentation strategy

Context: Examples only documentation (current) is insufficient. Manually maintained API documentation (current) is error prone. We must move towards an API documentation that is as automated as possible. Solutions:

Local development of rdkit-js.

Context: The above changes will add some complexity for the development of the library. Strategies to overcome it will be laid out once progress is made on segments of this epic. For example, to build higher-level JS functionalities on top of the minimallib, you need to build the right version of the minimallib locally first, and the local tooling to do that easily is not currently in place.

Deprecation of open source contributed examples

Context: Issues regarding examples implementation have stirred the rdkit-js efforts in a direction that does not improve the library itself, and making them mainstream in the library introduces maintenance overhead. Future work and GitHub issues should encourage contributions that will either 1) Make the library easier to use, 2) Add functionality to the library, or 3) Improve existing functionality of the library. Solutions:

Relevant links

MichelML commented 11 months ago

GH issues will be logged according to the above in the next few days.

@ptosco @greglandrum (or anyone ), any thoughts on the above would be greatly appreciated. Greg, Paolo, it was nice to see you both in person at the UGM, and the whole community actually. I'm glad we were able to exchange a bit Paolo, and hopefully next time Greg we'll get to chat more.

papillot commented 11 months ago

Thanks @MichelML for having laid out these directions!

One improvement that I can think about regarding the current JS API, would be to abstract the need to serialize parameters as JSON strings (such as the details option in the depiction functions) and to deserialize returned values to JavaScript Objects. I am using RDKit js in a Typescript project and passing parameters in a type safe manner is not straightforward. I realize that this may require breaking changes to the JSMol objects, but I guess that it would be useful to have this baked-in the library so that the users do not have to reinvent their own utilities (like an API to wrap the original API with complete typings).

Regarding the caching of the wasm module in IndexDB, this is not an issue I have been confronted to. In my experience, the wasm file is cached by the browser directly thanks to the request headers. This may not be achievable on every host environment though.

adam-of-barot commented 11 months ago

Thank you for the writeup @MichelML!

While I'm sort of sad to see the examples go, it's probably the right direction to go. RDKitJS is probably a bit tricky to meaningfully contribute to, since mainly JS devs want to use it, but it requires some C/C++ knowledge to actually move the project forward.

Also, I have started experimenting with different compilation options. Once the separate issues go up, I'll go into more detail, but I just wanted to share my initial findings.

MichelML commented 10 months ago

Thanks for this @adam-of-barot , I also experimented and got the same results as you.

When you compiled with --embind-emit-tsd interface.d.ts, did it compile the wasm & js files as well or only the .d.ts file? For me, it was only the .d.ts file , so if the same for you, we'll have to sort of build the lib twice as part of the release process. No big deal, just not the best.

For EXPORT_ES6, what was failing in the tests? Was it just a matter of adjusting how you import the lib prior to the tests?

I also found a promising avenue with the ENVIRONMENT link time options, where you can compile the wasm module for any runtime in those: web,webview,worker,node. I'm not entirely sure what webview is (i think it's a deprecated API for chrome), but after building including only the web,webview,worker runtime, I was able to successfully import the initializer without compile errors in the most recent version of create-react-app. Meaning, this:

import initRDKitModule from "@rdkit/rdkit";

worked out of the box without warnings or errors. This is encouraging to make the initialization/import easier and more modern.

adam-of-barot commented 10 months ago

I think all three files (.js, .wasm, .d.ts) got generated, but I'll have to double check. (EDIT: double checked, all files are generated) I compiled it with "-lembind --embind-emit-tsd interface.d.ts" flags instead of just "--bind" btw.

Testing the EXPORT_ES6 compiled .js file results in the following error:

/src/rdkit/Code/MinimalLib/demo/RDKit_minimal.js:3
var _scriptDir = import.meta.url
                        ^^^^
SyntaxError: Cannot use 'import.meta' outside a module
...

Changing require to import and changing .js to .mjs for both tests.js and RDKit_minimal.js seems to solve the issue, but other require statements inside the test file need to be changed to import as well.

But it's good to know that the ENVIRONMENT flag is promising. I'll play around with it too. I can think of Android System WebView and Microsoft WebView2 when seeing the webview environment.

MichelML commented 10 months ago

@adam-of-barot -lembind didn't work for me :man_shrugging: . Let me know when you try, and make sure to remove your previous build artifacts before you try.

To copy the interface.d.ts out of docker, you should just have to change this line https://github.com/rdkit/rdkit-js/blob/master/Dockerfile#L64 to:

RUN make -j2 RDKit_minimal && \
  cp Code/MinimalLib/RDKit_minimal.* ../Code/MinimalLib/demo/ && \
  cp Code/MinimalLib/interface.d.ts ../Code/MinimalLib/demo/
adam-of-barot commented 10 months ago

My bad @MichelML, sort of.

I was manually building the MinimalLib to better understand how the docker container looks during build time. I noticed that some parameters (like -- bind) are passed from CMakeLists.txt, so that's where I initially included --embind-emit-tsd. I read from the emscripten docs that --bind is deprecated in favor of -lembind, so I tried building with that, but it's probably not necessary to change that.

I wasn't sure if there were restrictions/best practices on where to pass which flag. But thankfully adding the --embind-emit-tsd flag in the Dockerfile does the same thing, so that's good.

I was able to get the autogenerated d.ts out of the container with that code, thanks!