mark-blacher / sql-algorithms

This repository contains supplementary code for the paper: Machine Learning, Linear Algebra and More: Is SQL All You Need?
MIT License
8 stars 1 forks source link

PostgreSQL COO results #1

Open R3gardless opened 11 months ago

R3gardless commented 11 months ago

https://github.com/mark-blacher/sql-algorithms/blob/357cb167516ef09543093497a100533c7d7a9114/case_study/experiments/main.py#L580-L583

I am currently reading your paper and trying to reproduce the experiment that is mentioned in the paper.

I have a question about the code.

The line of code results[features][samples]['pc'] is used to store the results of the experiment. I am wondering if this is the result of the Postgres COO result. However, the code also has the line COO=False. I am not sure if this means that the results are not stored in the COO format.

Could you please clarify if results[features][samples]['pc'] is the result of the Postgres COO result? If not, could you please explain what format the results are stored in?

julien-klaus commented 10 months ago

Hello, thank you very much for the interest in the experiments. It looks like something was copied incorrectly during the cleanup of the repository. Your intuition is correct, COO=False indicates that the COO format is not used.

A snipped of the dictionary can be the following:

results = 
  { '20': 
    {'10': 
      {'pc': 30s},
    ...
  }

indicating for 20 features and 10 samples for the 'pc' algorithm we have measured 30.1s (no real data, just an example).

R3gardless commented 10 months ago

Thanks. I am working on creating a Softmax regression model using only SQL, as you did in this paper. I noticed a floating-point difference of approximately 10^-7 when comparing results obtained using NumPy and Hyper(using np.allclose()). I am wondering if a similar value difference exists in Logistic Regression.

julien-klaus commented 9 months ago

These are simply rounding errors. They can occur naturally.