`FTAException` is not found

tsegall / fta

Metadata/data identification Java library. Identifies Semantic Type information (e.g. Gender, Age, Color, Country,...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support.

Apache License 2.0

24 stars 2 forks source link

`FTAException` is not found #21

Closed AayushSameerShah closed 1 year ago

AayushSameerShah commented 1 year ago

Hello tsegall, I am impressed that I have found this amazing work from you. Actually, I was getting started with this project and have downloaded the dependencies as follows:

screencapture-mvnrepository-artifact-com-cobber-fta-fta-12-6-4-2023-02-03-12_01_58

I created a separate project and as given above, have downloaded all 13 (1 main + 12 runtime) jars related to this project. Then I went ahead with the basic running of the example use cases given in the readme.

Code snippet I used


import ...
import com.cobber.fta.TextAnalysisResult;
import com.cobber.fta.TextAnalyzer;
import com.cobber.fta.core.FTAException; // this isn't found

public class detect_types {
    public static void main(final String[] args) throws FTAException {

        final TextAnalyzer analysis = new TextAnalyzer("Gender");
        final HashMap<String, Long> basic = new HashMap<>();

        basic.put("Male", 2_000_000L);
        basic.put("Female", 1_000_000L);
        basic.put("Unknown", 10_000L);

        analysis.trainBulk(basic);

        final TextAnalysisResult result = analysis.getResult();

        // result.getTypeQualifier() isn't found as well.
        System.err.printf("Semantic Type: %s (%s)%n" + result.getTypeQualifier() + result.getType());
        System.err.println("Detail: " + result.asJSON(true, 1));
    }
}

I have checked that in the jar, and package core isn't there. And maybe that is causing the problem. Maybe I am doing something wrong, please correct me if it is the case.

Thank you.

AayushSameerShah commented 1 year ago

Also to give a context of what I am trying to achieve:

Context

Actually, I am working on a project where I want to detect the semantic datatypes of the uploaded file from the user (generally in the CSV format).
As soon as the file is uploaded, for each column, we would propose the semantic datatype (using this library)
The CSV is generally converted into a spark Dataset<Row> type, so the semantic detection should be done either on the pure CSV or the spark dataset.

I am looking for a code to do just that. If you can direct me in that direction, it will be more than helpful. Thank you.

tsegall commented 1 year ago

So to make it clearer I have pulled out three of the existing samples into an examples directory with their own build files. I have also added a trivial CLI (Command Line) example, which you can use as a basis.

Do the following:

Clone the repository - 'git clone https://github.com/tsegall/fta.git'
Build the Cli example
- cd fta/examples/cli
- gradle clean installDist
Run the CLI on your sample CSV
- build/install/cli/bin/cli sample.csv

tsegall commented 1 year ago

Please note - that the CLI example has the locale hard-wired to en-US. In general FTA is sensitive to the locale and so it will only detect items relevant to the current locale. So for example if you were hoping to detect an Aadhar (https://en.wikipedia.org/wiki/Aadhaar) you would need to set the locale to 'en-IN'.

There is also another example (not currently checked in), that provides a minimal spring-boot application, you can enter the file:

and then see the output ...

I have also attached a sample file which may be useful for your testing.

sample100.csv

AayushSameerShah commented 1 year ago

This really was a great help. I've followed the steps you advised and in that way (using a CSV and CLI) the output is coming as desired and I have tested that on multiple other files as well.

But,

Pardon my question, but when I try to run the code in Java, it still isn't able to find the FTAException class. I tried to cut and pasted the "code" package from the downloaded project files, into the my CWD but still that was giving some errors like:

java: cannot access com.cobber.fta.core.FTAPluginException
  class file for com.cobber.fta.core.FTAPluginException not found

And for result.getType() some errors like:

java: cannot access com.cobber.fta.core.FTAType
  class file for com.cobber.fta.core.FTAType not found

And thus, I am unable to run the code from java. (Yes, I have downloaded and updated the new jar 12.6.5)

And so, If there is any way to run the FTA on the spark Dataset<Row>? If that is, it will be wonderful because not always we get the data in csv format, there can be excel, SQL etc and all of them get converted eventually to the spark dataset.

Roughly the process will be: Upload.csv → A spark dataset → FTA_Engine → SemanticTypes for each columns.

I clearly am missing something here, for that I strongly apologize. But if possible will you please direct me through this confusion? Which might help a future develop who may face the same problem.

Thank you very much for your response.

tsegall commented 1 year ago

You are not pulling in the fta-core library which is a dependency fta requires. If you are in the cli example directory and run 'gradle dependencies --configuration runtimeClasspath' you will see the list of required dependencies which is the same as the set of files in the build/install/cli/lib subdirectory. Hence the following simple java command will execute the CLI:

java -cp "build/install/cli/lib/commons-logging-1.2.jar:\
build/install/cli/lib/jackson-core-2.14.2.jar:\
build/install/cli/lib/jakarta.activation-2.0.1.jar:\
build/install/cli/lib/checker-qual-3.12.0.jar:\
build/install/cli/lib/fta-12.6.4.jar:\
build/install/cli/lib/libphonenumber-8.13.5.jar:\
build/install/cli/lib/error_prone_annotations-2.11.0.jar:\
build/install/cli/lib/xeger-1.0.0-RELEASE.jar:\
build/install/cli/lib/automaton-1.12-4.jar:\
build/install/cli/lib/jakarta.mail-2.0.1.jar:\
build/install/cli/lib/cli.jar:\
build/install/cli/lib/sketches-java-0.8.2.jar:\
build/install/cli/lib/failureaccess-1.0.1.jar:\
build/install/cli/lib/commons-validator-1.7.jar:\
build/install/cli/lib/slf4j-api-2.0.6.jar:\
build/install/cli/lib/univocity-parsers-2.9.1.jar:\
build/install/cli/lib/fta-core-12.6.4.jar:\
build/install/cli/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:\
build/install/cli/lib/commons-collections-3.2.2.jar:\
build/install/cli/lib/j2objc-annotations-1.3.jar:\
build/install/cli/lib/commons-beanutils-1.9.4.jar:\
build/install/cli/lib/commons-digester-2.1.jar:\
build/install/cli/lib/commons-lang3-3.12.0.jar:\
build/install/cli/lib/jackson-annotations-2.14.2.jar:\
build/install/cli/lib/jackson-databind-2.14.2.jar:\
build/install/cli/lib/guava-31.1-jre.jar:\
build/install/cli/lib/jackson-datatype-jsr310-2.14.2.jar:\
build/install/cli/lib/jsr305-3.0.2.jar:\
build/install/cli/lib/automaton-1.11-8.jar:\
build/install/cli/lib/commons-text-1.10.0.jar" cli.Cli

AayushSameerShah commented 1 year ago

Thanks that has worked for me and now the examples are running smoothly. All I needed was these libraries in my working project which weren't there before but were available in build/install/cli/lib.

Thanks for your support 🙏🏻

tsegall commented 1 year ago

In general you should be using a tool like gradle or maven to manage these dependencies. However, it sounds like you are all set so I will close this issue.