scicloj / tablecloth

Dataset manipulation library built on the top of tech.ml.dataset
https://scicloj.github.io/tablecloth
MIT License
289 stars 24 forks source link

left-join destroys row-sort #113

Closed awb99 closed 1 year ago

awb99 commented 1 year ago

When I call tablecloth left-join then it adds rows that are not in the right dataset into the end. I guess this is a bug in tablecloth. Because left-join should not really affect the sort order.

My example: calendar: dataset sorted by :date (it only has :date column) ds-bars: dataset sorted by :date (columns: :open :high :low :close :volume :date)

(defn align-to-calendar [calendar bars]
  (-> (tc/left-join calendar bars :date)
      (tc/order-by [:date] [:asc]) ; this should not be necessary!
      (tc/set-dataset-name (-> bars meta :name))))
awb99 commented 1 year ago
:date :open :high :low :volume :close
2023-09-01T00:00 137.49 137.5000 134.8500 16185949 135.66
2023-09-05T00:00 135.37 136.4200 134.5801 14948394 135.75
2023-09-06T00:00 136.02 136.5299 133.6650 14601212 134.46
2023-09-07T00:00 133.57 135.5800 132.9500 14062620 135.27
2023-09-04T00:00         135.27
2023-09-08T00:00         135.27
2023-09-11T00:00         135.27
2023-09-12T00:00         135.27
2023-09-13T00:00         135.27
awb99 commented 1 year ago

The above table shows September 4 (which does not have a bar) gets added below. So all rows that do not have the row in the right dataset are in the end of the dataset.

genmeblog commented 1 year ago

I'll take a look at that today. One question: which version of TMD is on your classpath?

genmeblog commented 1 year ago

left-join on single column delegates action to tech.v3.dataset.join/left-join so I think it is a problem on TMD side.

genmeblog commented 1 year ago

Ok, verified in the TMD source code. It looks like expected behaviour. Nothing to do here.