mverzilli / crystalla

Crystal library for Numerical Methods. It binds to LAPACK and is unashamedly inspired by Numpy.
MIT License
45 stars 7 forks source link

Vector support #11

Open g3ortega opened 8 years ago

g3ortega commented 8 years ago

Hi, I have started to play with Crystal. For now, I'm trying to migrate some scripts I had in ruby with some algorithms implementations (about all ML algorithms). I could migrate some matrices using this library, but I couldn't find support for Vector class in Crystal and neither crystalla.

http://ruby-doc.org/stdlib-2.2.1/libdoc/matrix/rdoc/Vector.html

Can you help me? I am willing to contribute with these kind of libraries when I have enough knowledge and experience with Crystal , I think it has an enormous potential for Data Science, and I eventually want to use it in some real data oriented projects 😄

mverzilli commented 8 years ago

Hi! I'll be glad to help you, but I'd rather not just port Vector from Ruby's stdlib. When starting Crystalla, I was trying to borrow ideas and APIs mainly from Matlab and Numpy, which I think are much more mature for this particular domain.

So a couple of goals for Crystalla would be:

I guess a resulting non-goal is: to support existing Ruby scripts out of the box.

It's a bit analogous to one of Crystal's own goals: "Have a syntax similar to Ruby (but compatibility with it is not a goal)".

Let's do this, if you are still on board with that direction, why don't you share one of the Ruby code snippets you're trying to port here? We can explore together organic ways of extending Crystalla to provide all the necessary features.

WDYT?

g3ortega commented 8 years ago

Sure, thanks for your response. Here's a simple sample of k-Nearest Neighbors using vectors to represent coordinate minutes, and there are others applications for Vector class, but I totally understood what you said, I will take a look to Matlab and numpy implementations/alternatives for some libs and try to figure out how to work on a similar API in Crystal.

require "matrix"

class HouseValidator

  def initialize(data, position)
    @data = data
    @position = position
  end

  def find_nearest 
    @data.sort_by {|point,v|
      distance(point, @position)
    }.first
  end

  def find_nearest_with_k k
    @data.sort_by {|point, v|
      distance(point, @position)
    }.first(k)
  end

  private 

  def distance v1, v2
    (v1 - v2).magnitude
  end  

end

# Distance in minutes for lng and lat
neighborhoods_data = {
  Vector[56, 13] => "Safe",
  Vector[23, 9] => "Unsafe",
  Vector[45, 28] => "Safe",
  Vector[12, 3] => "Unsafe",
  Vector[15, 7] => "Unsafe",
  Vector[7, 3] => "Unsafe",
  Vector[49, 31] => "Safe"
}

house_1 = Vector[10, 10]
house_2 = Vector[40,20]
house_3 = Vector[32,10]

h1 = HouseValidator.new(neighborhoods_data, house_1)
p h1.find_nearest 

h2 = HouseValidator.new(neighborhoods_data, house_2)
p h2.find_nearest 

h2 = HouseValidator.new(neighborhoods_data, house_3)
p h2.find_nearest 
mverzilli commented 8 years ago

Thanks for sharing! Actually, we don't have a Vector abstraction but we do have some convenience methods to create row and column vectors in Matrix. For example, see:

https://github.com/mverzilli/crystalla/blob/master/spec/matrix_spec.cr#L71

That spec file should provide a lot of hints on what's currently possible with the lib.

Also, it seems you're only using two Vector operations: subtraction and "magnitude". I don't have time to test it right now, but I think my first attempt would be to:

  1. Replace Vector constructors like Vector[32,10] with Matrix.row_vector(32,10).
  2. Subtraction should work out of the box, so no prob there.
  3. Implement "magnitude" as an auxiliary function that receives a Matrix right there on the script.
  4. Check results.

If that works, the only pending issue would be to decide how to implement "magnitude". Maybe we could add a "norm" method to Matrix... I think the norm of a vector is just a special case of the norm of a matrix (but I should revisit my Linear Algebra bibliography :P).

Edit: of course, you may also need to introduce some other changes so it compiles in Crystal :)

mverzilli commented 8 years ago

This seems to exhibit the same behaviour as your script:

require "crystalla"
include Crystalla

class HouseValidator
  def initialize(data : Hash(Matrix, String), position : Matrix)
    @data = data
    @position = position
  end

  def find_nearest
    point, v, distance = @data.map { |point, v| {point, v, distance(point, @position)} }
                              .sort_by { |(point, v, distance)| distance }.first
    {point, v}
  end

  private def norm(m : Matrix)
    raise "norm is only defined for row vectors" unless m.row_vector?

    square_norm = 0
    m.each_row do |r|
      square_norm = r.map { |x| x * x }.sum
    end

    Math.sqrt(square_norm)
  end

  private def distance(v1, v2)
    norm(v1 - v2)
  end
end

# Distance in minutes for lng and lat
neighborhoods_data = {
  Matrix.row_vector [56, 13] => "Safe",
  Matrix.row_vector [23, 9]  => "Unsafe",
  Matrix.row_vector [45, 28] => "Safe",
  Matrix.row_vector [12, 3]  => "Unsafe",
  Matrix.row_vector [15, 7]  => "Unsafe",
  Matrix.row_vector [7, 3]   => "Unsafe",
  Matrix.row_vector [49, 31] => "Safe",
}

house_1 = Matrix.row_vector [10, 10]
house_2 = Matrix.row_vector [40, 20]
house_3 = Matrix.row_vector [32, 10]

h1 = HouseValidator.new(neighborhoods_data, house_1)
p h1.find_nearest

h2 = HouseValidator.new(neighborhoods_data, house_2)
p h2.find_nearest

h2 = HouseValidator.new(neighborhoods_data, house_3)
p h2.find_nearest

Note the implementation of norm is far from elegant :P. I guess that's because we lack an Enumerable implementation over rows or cols... We'll need to think about that and seek some inspiration from Numpy :)

mverzilli commented 8 years ago

I guess sooner than later we'll need to implement something like Numpy's n-dimensional array. Being able to operate with matrices and slice them into submatrices (as you generally would in Matlab) is really nice, but it's kind of limited to 2-dimensional matrices. Typical ML libs and algorithms need to work in n-dimensional structures, so matrices will only get us so far. I guess it's time to roll up our sleeves and take Crystalla in that direction!

g3ortega commented 8 years ago

Wow, thank you @mverzilli. Now I have a crystal version running and a better idea about how to migrate other scripts from ruby to crystal 🎆

time ruby ~/vector_sample.rb 0.07s user 0.05s system 52% cpu 0.240 total

time ./vector_sample 0.00s user 0.00s system 76% cpu 0.006 total

I'm falling in love with Crystal and the community 😃 and I want contribute in the future to this library and crystal itself. For now I am going to report any problem during some experiments while migrating more advanced algorithms, and take a closer look to Numpy and Matlab for ideas.

What would you recommend me to read, do or take a look, to be able to contribute in the future? Do you have a kind of roadmap for crystalla?

Thanks!

mverzilli commented 8 years ago

Hey, thanks for the kind words! I started Crystalla because I'm a pathological procrastinator. I had decided to follow this tutorial: http://deeplearning.stanford.edu/tutorial/, but when I started it I said "you know what, it'd be nice if you could do all the exercises in pure Crystal, so I'll take my time and try to implement what's necessary". And the first obvious need was to have a nice and efficient implementation of matrices.

@spalladino, another guy from Manas, suggested that we used LAPACK and BLAS so I wrote the bindings and some basic operations. Then we benchmarked against Numpy and we got these results: http://manas.com.ar/blog/2015/10/30/linear-algebra-in-crystal-from-lapack.html. It turned out to be a good idea!

As you can see, there's no roadmap. When I have some spare time and the mental energy to program, I go back to the tutorial and check what's the next thing to implement in Crystal, and just do that.

I guess one interesting next step would be to port all the functionality we already have in Matrix to an n-dimensional array implementation

mverzilli commented 8 years ago

That said, definitely keep on bringing roadblocks and concrete use cases! It's a lot nicer to shape the library based on real world problems.

mverzilli commented 8 years ago

This is super relevant to understand some of the implications of using LAPACK et al: https://scottlocklin.wordpress.com/2016/05/26/numeric-linear-algebra-code-an-appreciation/

spalladino commented 8 years ago

That said, definitely keep on bringing roadblocks and concrete use cases! It's a lot nicer to shape the library based on real world problems.

Absolutely! I'd suggest trying to pick a problem at a time, and implement the required functionality to support it. MNIST using PCA was my first (and only, so far) test subject, but it could be interesting to pick other solutions and implement them as well.

mverzilli commented 7 years ago

@g3ortega Not sure if you're still interested, but a quick update on this: @MatthiasRMS is working on an initial NdArray implementation. We're discussing the details here: https://github.com/mverzilli/crystalla/pull/15

Feel free to chip in!

pazzo83 commented 7 years ago

Hi! Is work on this package continuing, or is there a replacement package? Thanks!

mverzilli commented 7 years ago

Hi! At the moment I'm not putting time into this. You can check the discussions here to get some candidate replacements: https://github.com/crystal-community/crystal-libraries-needed/issues/15