redpanda-ai / Meerkat

Used for the Meerkat project
Other
1 stars 1 forks source link

Defintion of Done Missing #949

Open redpanda-ai opened 8 years ago

redpanda-ai commented 8 years ago

@vnagarajY

We need a _Definition of Done_. Specifically, what are we trying to predict and with what accuracy (precision, recall, f1-measure, we need some solid goals here).

If we have multiple objectives, let's set accuracy goals for each. We need this promptly, otherwise there's no point to building anything. We'll do what we can in the meantime, but nothing works without this!

speakerjohnash commented 8 years ago

Agreed, but there's no doubt that the goal is attributes we can only get via ES. Specifically we need an aggData index. The sooner we have that the better. The accuracy goals will not change the approach. If we're trying to enrich these transactions with fields not actually in the transaction itself we'll need to be using ES.

vnagarajY commented 8 years ago

Bank Debit ACH transactions: Precision should be 99% Recall Should be 40% -- (among not null merchants) Non Null This applies to identification of merchant name primarily Any additional info is bonus

Bank Debit POS(Debit Card transactions)

Precision : Merchant Name - 99%+ Exact geo: 98%

Recall: Merchant Name 95% Exact Geo : 90%

Population of Merchant name: 90% all transactions Population of Exact Geo - 90% of physical transactions in US

Card Debit(Credit Card transactions) Precision : Merchant Name - 99%+ Exact geo: 98%

Recall: Merchant Name 95% Exact Geo : 90%

Population of Merchant name: 90% Population of Exact Geo - 90% of physical transactions in US

Exact Geo:

Implies population of street address, longitude, latitude, city, state, zip

If available add URL, phone #, yodlee normalized category, business category from factual/agg

redpanda-ai commented 8 years ago

@msevrens I like ES too, but keep this issue focused on the definition of done. Make tech recommendations on another issue.

vnagarajY commented 8 years ago

missed of a key part in definition of done

To complete POC:

we need to have a process where the file they give end of 4 weeks for POC completion - we can turn around in 4 hours 1) Merging of files across multiple groups and creating an input file 2) Run each type of file(3 types in total) thru necessary process/services and create corresponding output file for each type will all data