sdv-dev / SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.
https://docs.sdv.dev/sdmetrics
MIT License
201 stars 45 forks source link

Use parallelization in single and multi-table reports #544

Open echatzikyriakidis opened 5 months ago

echatzikyriakidis commented 5 months ago

Problem Description

Using Quality Report is very slow when running it on large datasets either for single or multi-table.

Expected behavior

Since each column or column pair metric is independent and can be computed in parallel can't we use multiprocessing or multithreading to allow parallelization? For Column Shapes, Column Pair Trends, etc.

Could anyone help on this? @frances-h I have seen that lately you changed the library, can you redirect me to the most appropriate person for that?

Thank you!