nim-lang / needed-libraries

This repository contains a list a needed libraries.
112 stars 5 forks source link

[Meta] Are we scientists yet? #77

Open mratsim opened 6 years ago

mratsim commented 6 years ago

This is a meta-issue to keep track of discussion around Nim scientific libraries.

Primitive libraries

Decimal128: https://github.com/JohnAD/decimal128 Fixed-point: https://gitlab.com/lbartoletti/fpn

Multidimensional arrays, Linear-algebra

Multidimensional arrays are the basic block of scientific computing, it goes beyond the 2D or 3D vectors and matrices. Notable non-Nim implementations include Fortran, Julia, Matlab and Numpy.

Status: in-progress Libraries:

Support

Arraymancer supports dense multidimensional arrays of any type, on CPU (integers, floats, complex), Cuda and OpenCL (float only) and uses BLAS, CuBLAS and Clblast under the hood.

Flambeau is provide libtorch bindings and reproduces PyTorch functionality.

Manu is a pure Nim matrix library with no external dependencies

Neo supports dense and sparse float vectors and matrices, on CPU and Cuda (Nvidia GPUs) and also uses BLAS and LAPACK under the hood.

Status: stalled Libraries:

NimTorch supports most PyTorch features regarding multidimensional arrays, on CPU, Cuda, OpenCL and AMD ROCm provided you compiled PyTorch's Aten backend with the relevant features.

Plotting

Data analysis requires plotting, notable non-Nim implementations include Python matplotlib and seaborn, Plot.ly (Python, R, Javascript), R ggplot2, Matlab and Facebook Visdom (a simple interface to Plot.ly).

Note that there are a couple of approach to plotting, either having a charting library or having a high-level grammar library (similar to SQL) that hides low-level details of a chart.

Status: in-progress Libraries:

Proof-of-concepts:

Unmaintained:

ggplotnim is an implementation in pure Nim of the graphics of grammar. gnuplot.nim is a wrapper of gnuplot. Nim-Plotly uses the plot.ly charting library as a backend. Both MetaPlot and Monocle uses the Vega visualization grammar.

Image processing library

Computer vision is a thriving area of research. Vision scientists needs algorithms that works on images represented as a multidimensional arrays (different from say Photoshop), preferably multithreaded and GPU accelerated.

Notable non-Nim libraries include OpenCV, Matlab, Python scikit-image, scipy.ndimage and mahotas.

Status: in-progress

Libraries:

Unmaintained:

Nim-opencv provides rough low-level bindings of OpenCV functions.

Dataframe and columnar/tabular data processing

Dataframes are essential to process structured data (say Name, Age, number of products bought, last time of visit). They allow very efficient data manipulation, including easily creating new columns, joining dataframes, converting between types.

Notable non-Nim packages include Python Pandas and R datatable. When data does not fit in RAM, dataframe packages are interfaced with SQL or HDF5 datastores or even Spark for very large scale processing.

Status: in-progress Libaries:

Random library

Lots of scientific algorithms rely on stochastic processes or random distribution. At the very least pseudo-random generator that samples from a normal/gaussian distribution is needed.

Notable non-Nim library include Scipy

Status: in-progress Libraries:

Statistics library

Notable language: R

Status: standard lib statistics module

Machine learning

Machine learning is how to teach a computer to learn/generalize patterns from data.

Notable non-Nim libraries include: Python's Scikit-Learn and R's Caret. State-of-the-art C++ library to wrap: XGBoost

Status: in-progress

Deep learning & neural network.

Deep learning is machine learning with neural networks and arguably eating the world (or atleast Reddit, Hacker News and sponsors). In comparison to most traditional machine learning tools, neural networks can also learn very well from non-structured data (images, sounds, text ...).

Notable non-Nim libraries include: Facebook Torch, Google Tensorflow, Apache and Amazon Mxnet

Status: in-progress Libraries:

Proof-of-concept:

Non-linear optimization

Status: in-progress Libraries:

Linear programming

Status: in-progress Libraries:

Computational Physics

Status: in-progress Libraries:

Geometry

Computational geometry also require tuned algorithms for: geometry primitives, polygons and polyhedron, triangulations, distances, shape analysis ...

Noteable non-Nim library: CGAL

Status: no library

Scientific serialization format

There are many formats specific to science ot even science domains.

Libraries:

Geospatial library

Often scientist needs to deal with geospatial coordinate (latitude, longitude), maps and distances. This include efficient data-structures like KD-Tree or RTree to compute distances between points and distance formulas like Haversine to compute distance on a sphere.

Notable non-Nim libraries include Python's scipy.spatial, Geopy, Shapely

Status: in-progress R-tree forum thread.

Proof-of-concepts:

Scientific language bindings

Python:

Unmaintained

mratsim commented 6 years ago

Placeholder.

To avoid polluting this meta-thread with specific discussion on certain topics (say what I want in the random library), this will link to the discussion topics:

Multidimensional arrays, Linear-algebra

14, #17, #25, #50, #59

Plotting

17, #51, #70

Geospatial

13, #69

Image processing

69

Dataframes, columnar/tabular data processing

20, #47, #33

Random

40

Statistics

16

Machine learning

48

Deep learning

No issue open

Computational Geometry

53

andreaferretti commented 6 years ago

For sampling from other distributions, there is Alea. I have to clean it up - some examples fail with the latest concept changes in devel - but I hope to make these work again soon

dom96 commented 6 years ago

This almost makes me want to buy arewescientistsyet.org ala http://www.arewewebyet.org/. Perhaps you'd be interesting in creating something like this? :)

sdwfrost commented 6 years ago

I would also add in differential equation solvers as well as Markov chain Monte Carlo samplers...

Vindaar commented 6 years ago

Over the last 2 months I've been working on high level bindings to the HDF5 library:

https://github.com/Vindaar/nimhdf5

It's still very much work in progress (also due to my limited knowledge of Nim and the more low level parts of HDF5). As a raw wrapper it should be fully functional, with the downside of the (imo not very intuitive) C API. But the high level bindings are improving slowly. There's an example (examples/h5_create_dataset_hl.nim) showing the available features.

narimiran commented 6 years ago

Plotting

Status: no libraries

EelcoHoogendoorn commented 6 years ago

By far the most important category is missing from this list I feel; and that is first-class two way python bindings.

The ability of python to easily (relatively, for the time) interface with the then-dominant languages was pivotal in its adoption in scientific computing.

Id use a ton of nim from python right away if there was a clean, boiler plate free method of sending ndarrays back and forth between the two. Last time I checked there was not, and as much as i like nim I dont see it replacing my entire python ecosystem any day soon.

In particular, I would much rather use nim than cython or numba or any such half-baked language. Boost-python has the bindings figured out pretty well but then again I can rarely justify having to deal with C++.

But a system of bindings with the convenience of boost-python but without the C++ would massively expand the usability of nim for my (and I think its not just me) scientific programmers.

Also, starting out a project in nim would be a much better proposition if i had the reassurance I could always pop up a matplotlib debug figure without any hassle.

andreaferretti commented 6 years ago

@EelcoHoogendoorn there are a few projects.

None of these projects is fully mature at this point, but this is definitely something doable

EelcoHoogendoorn commented 6 years ago

Of course it is doable; both Python and nim are Turing complete. But without having the time to put in the work to make these into feature complete mature solutions myself, it is what is stopping me from using nim at present.

The good news is that this should be a lot less work than reinventing matplotlib.

On May 2, 2018 15:29, "Andrea Ferretti" notifications@github.com wrote:

@EelcoHoogendoorn https://github.com/EelcoHoogendoorn there are a few projects.

None of these projects is fully mature at this point, but this is definitely something doable

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nim-lang/needed-libraries/issues/77#issuecomment-385977811, or mute the thread https://github.com/notifications/unsubscribe-auth/ABt1BZQX3jCaLkItgxJvCC2tRNjxO9Tbks5tubTPgaJpZM4Qh_O5 .

brentp commented 6 years ago

I think most active nim users are aware of this by now, but there's a functioning plotting library here: https://github.com/brentp/nim-plotly

since it serializes to json and uses plotly.js to plot (but it works for the C backend), it will have a limited number of points, but when using webGL it can plot ~200K points in my browser and still be tolerably responsive.

EelcoHoogendoorn commented 6 years ago

Hi brentp;

Thats looking pretty cool indeed! Note that I am not trying to take a jab at plotting in nim specifically, but trying to make a point about the relative size of the ecosystem of python and nim generally; plotting is just an example.

I think itd be foolish to expect nim to be able to compete with python anytime soon on that front; making sure we have first-class two-way interop between the two sounds like it might happen a decade sooner at least.

Vindaar commented 6 years ago

And finally we can do non-linear least square fitting in Nim :)

https://github.com/Vindaar/nim-mpfit

Vindaar commented 6 years ago

Finally spent some time to make the interface for my NLopt wrapper nicer and create a PR for nimble for it. So if non-linear least square fitting isn't for you, maybe general nonlinear optimization is. ;)

https://github.com/Vindaar/nimnlopt

abudden commented 5 years ago

For some precision engineering/scientific applications, the ability to use arbitrary precision floating point arithmetic would be useful. Does an MPFR wrapper a la Julia's built-in support for BigFloat belong on this list?

Araq commented 5 years ago

@abudden Certainly.

retsyo commented 5 years ago

it seems that there is still no computer algebra system module like https://www.sympy.org/. I also made a post https://forum.nim-lang.org/t/4165

brentp commented 5 years ago

a decent stats package would be a huge boon for my work. Even if it started with t-test and anova.

sinkingsugar commented 5 years ago

https://github.com/fragcolor-xyz/nimtorch

Full pytorch for nim, for you.

ihendley commented 5 years ago

Do we want a category for natural language processing? Examples of Python libraries are nltk, gensim, spacy, and scikit-learn.

ihendley commented 5 years ago

Also, how about mathematical optimization - like scipy.optimize for example, and how about signal processing - like scipy.signal?

Araq commented 5 years ago

@ihendley I think so, yes.

mantielero commented 5 years ago

Simulation

What about simulation? Something like simulink, modelica or Modia (in Julia).

It would be nice something similar to Modia in particular, given Nim's metaprogramming capabilities.

One area where I believe nim could shine is in exporting FMU model (following the FMI standard). I don't see python doing that. An even for Julia is a struggle because they need to export the runtime for compiled stuff which is big and not straightforward (here you can see how the libraries take above 100Mb for a simple example, when compiled ahead of time).

Relevant links

FMI Code Generator FMU SDK Sundials: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers in order to embed the solver in the FMU. Bindings for this would be useful even on itself. SimulatorToFMU

mratsim commented 5 years ago

It's been a while since I updated the original post but it's done :)

brentp commented 4 years ago

having a (nearly?) fully functional jupyter kernel would be quite useful for my work and, I suspect for many people.

Vindaar commented 4 years ago

having a (nearly?) fully functional jupyter kernel would be quite useful for my work and, I suspect for many people.

@brentp: There is (or was) jupyternim: https://github.com/stisa/jupyternim I'm not sure if it's abandoned and/or still compiles (last activity Oct 2018); I have never used it. Its downside is that it was written without hot code reloading in mind of course. However, I think it'd provide a nice basis for an updated implementation, which uses HCR for the relevant parts and the socket communication of jupyternim.

I once started playing around with HCR, but wasn't very successful even implementing a trivial repl, https://github.com/vindaar/brokenrepl. Posting it here if anyone wants to give it a try.

brentp commented 4 years ago

yes, I saw that and inim from @stisa, now that there are ggplots and dataframes, the notebook would a be a boon.

stisa commented 4 years ago

(my) jupyternim and inim are the same code, there was a naming conflict with https://github.com/AndreiRegiani/INim so I renamed it. I agree it's due an update, but I have been pretty busy this year.
Last time I saw, HCR was limited to JS target, looking at https://nim-lang.org/docs/hcr.html there was a lot of progress so I may have a look into adopting it when I get some free time, if nobody starts working on it first.

jblindsay commented 4 years ago

I've just published a pure Nim k-d tree implementation here.

Vindaar commented 4 years ago

@mratsim, @brentp, @HugoGranstrom and me chatted recently about trying to unify the science related code a little more. While we didn't decide anything specific yet, we talked about creating an organization to hold related repositories in the future:

https://github.com/SciNim

I only invited a few people that from the top of my head use Nim for science related stuff. If you want join, feel free to message me or just join the gitter channel here:

https://gitter.im/SciNim/community

and say hi.

mantielero commented 4 years ago

I played during easter about creating a web based on Hugo for this purpose. I am happy to provide it to you.

I have uploaded it here: https://mantielero.github.io/nim4science/

Feel free to use it.

lbartoletti commented 3 years ago

I've just released a pure Nim fixed point number library here

I started working on a geometry (mainly focus on GIS and CAD) library, but it is not yet presentable :)

planetis-m commented 2 years ago

My linear algebra package: https://github.com/planetis-m/manu is still in development and I am happy accept contributions.