Running HyUCC algorithm on cli

sekruse / metanome-cli

Run Metanome algorithms from the command line

http://www.metanome.de/

Apache License 2.0

7 stars 6 forks source link

Running HyUCC algorithm on cli #17

Closed faisal-ksolves closed 1 year ago

faisal-ksolves commented 1 year ago

hello everyone, im trying to run HyUCC algorithm on cli with the following command java -cp metanome-cli-1.1.0.jar:HyUCC-1.2-SNAPSHOT.jar de.metanome.cli.App --algorithm de.metanome.algorithms.hyucc.HyUCC --file-key INPUT_GENERATOR --files load:/home/faisal/ksolves/metanome/cli/WDC_age.csv But it throws some error ::-

Running de.metanome.algorithms.hyucc.HyUCC

in: [load:/home/faisal/ksolves/metanome/cli/WDC_age.csv]
out: file
configuration: [] Initializing algorithm. Could not initialize algorithm. de.metanome.algorithm_integration.AlgorithmConfigurationException: File not found!

what should i do now? can anyone help me

sekruse commented 1 year ago

Hi, the load: in the --files parameter looks incorrect to me. Assuming you want to profile WDC_age.csv, then it should just be

--files /home/faisal/ksolves/metanome/cli/WDC_age.csv

The description of the --files parameter isn't super clear, admittedly, but load: should only be used if you have a file that contains a list of files to be analyzed:

input file/tables to be analyzed and/or files list input files/tables (prefixed with 'load:')

faisal-ksolves commented 1 year ago

thanks @sekruse it has been resolved with the following command java -cp metanome-cli-1.1.0.jar:HyUCC-1.2-SNAPSHOT.jar de.metanome.cli.App --algorithm de.metanome.algorithms.hyucc.HyUCC --file-key INPUT_GENERATOR --files WDC_age.csv

faisal-ksolves commented 1 year ago

But here is another question, Can we convert this project into spark?

sekruse commented 1 year ago

Glad to hear your problem is resolved!

For your second question, the answer is unfortunately: No. But that's also more a question for HyUCC, which is the algorithm doing all the heavy lifting. I am not aware of a meaningful way to implement UCC discovery on Spark that would beat single-machine performance.