Closed AayushSameerShah closed 1 year ago
Also to give a context of what I am trying to achieve:
Dataset<Row>
type, so the semantic detection should be done either on the pure CSV or the spark dataset.I am looking for a code to do just that. If you can direct me in that direction, it will be more than helpful. Thank you.
So to make it clearer I have pulled out three of the existing samples into an examples directory with their own build files. I have also added a trivial CLI (Command Line) example, which you can use as a basis.
Do the following:
Please note - that the CLI example has the locale hard-wired to en-US. In general FTA is sensitive to the locale and so it will only detect items relevant to the current locale. So for example if you were hoping to detect an Aadhar (https://en.wikipedia.org/wiki/Aadhaar) you would need to set the locale to 'en-IN'.
There is also another example (not currently checked in), that provides a minimal spring-boot application, you can enter the file:
and then see the output ...
I have also attached a sample file which may be useful for your testing.
This really was a great help. I've followed the steps you advised and in that way (using a CSV and CLI) the output is coming as desired and I have tested that on multiple other files as well.
Pardon my question, but when I try to run the code in Java, it still isn't able to find the FTAException
class. I tried to cut and pasted the "code" package from the downloaded project files, into the my CWD but still that was giving some errors like:
java: cannot access com.cobber.fta.core.FTAPluginException
class file for com.cobber.fta.core.FTAPluginException not found
And for result.getType()
some errors like:
java: cannot access com.cobber.fta.core.FTAType
class file for com.cobber.fta.core.FTAType not found
And thus, I am unable to run the code from java. (Yes, I have downloaded and updated the new jar 12.6.5)
And so, If there is any way to run the FTA on the spark Dataset<Row>
? If that is, it will be wonderful because not always we get the data in csv format, there can be excel, SQL etc and all of them get converted eventually to the spark dataset.
Roughly the process will be: Upload.csv
β A spark dataset
β FTA_Engine
β SemanticTypes
for each columns.
I clearly am missing something here, for that I strongly apologize. But if possible will you please direct me through this confusion? Which might help a future develop who may face the same problem.
Thank you very much for your response.
You are not pulling in the fta-core library which is a dependency fta requires. If you are in the cli example directory and run 'gradle dependencies --configuration runtimeClasspath' you will see the list of required dependencies which is the same as the set of files in the build/install/cli/lib subdirectory. Hence the following simple java command will execute the CLI:
java -cp "build/install/cli/lib/commons-logging-1.2.jar:\
build/install/cli/lib/jackson-core-2.14.2.jar:\
build/install/cli/lib/jakarta.activation-2.0.1.jar:\
build/install/cli/lib/checker-qual-3.12.0.jar:\
build/install/cli/lib/fta-12.6.4.jar:\
build/install/cli/lib/libphonenumber-8.13.5.jar:\
build/install/cli/lib/error_prone_annotations-2.11.0.jar:\
build/install/cli/lib/xeger-1.0.0-RELEASE.jar:\
build/install/cli/lib/automaton-1.12-4.jar:\
build/install/cli/lib/jakarta.mail-2.0.1.jar:\
build/install/cli/lib/cli.jar:\
build/install/cli/lib/sketches-java-0.8.2.jar:\
build/install/cli/lib/failureaccess-1.0.1.jar:\
build/install/cli/lib/commons-validator-1.7.jar:\
build/install/cli/lib/slf4j-api-2.0.6.jar:\
build/install/cli/lib/univocity-parsers-2.9.1.jar:\
build/install/cli/lib/fta-core-12.6.4.jar:\
build/install/cli/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:\
build/install/cli/lib/commons-collections-3.2.2.jar:\
build/install/cli/lib/j2objc-annotations-1.3.jar:\
build/install/cli/lib/commons-beanutils-1.9.4.jar:\
build/install/cli/lib/commons-digester-2.1.jar:\
build/install/cli/lib/commons-lang3-3.12.0.jar:\
build/install/cli/lib/jackson-annotations-2.14.2.jar:\
build/install/cli/lib/jackson-databind-2.14.2.jar:\
build/install/cli/lib/guava-31.1-jre.jar:\
build/install/cli/lib/jackson-datatype-jsr310-2.14.2.jar:\
build/install/cli/lib/jsr305-3.0.2.jar:\
build/install/cli/lib/automaton-1.11-8.jar:\
build/install/cli/lib/commons-text-1.10.0.jar" cli.Cli
Thanks that has worked for me and now the examples are running smoothly. All I needed was these libraries in my working project which weren't there before but were available in build/install/cli/lib
.
Thanks for your support ππ»
In general you should be using a tool like gradle or maven to manage these dependencies. However, it sounds like you are all set so I will close this issue.
Hello tsegall, I am impressed that I have found this amazing work from you. Actually, I was getting started with this project and have downloaded the dependencies as follows:
I created a separate project and as given above, have downloaded all
13
(1 main + 12 runtime) jars related to this project. Then I went ahead with the basic running of the example use cases given in the readme.Code snippet I used
I have checked that in the jar, and package core isn't there. And maybe that is causing the problem. Maybe I am doing something wrong, please correct me if it is the case.
Thank you.