mitdbg / aurum-datadiscovery

MIT License
74 stars 49 forks source link

Aurum CLI, Neo4J Improvements, Minor api & bug fixes #129

Closed Florents-Tselai closed 5 years ago

Florents-Tselai commented 5 years ago

This PR includes the following:

Aurum CLI

An Aurum CLI module aurum_cli.py that aims to make the Aurum workflow easy and straightforward (especially for newcomers) Through the CLI module one can:

CLI Documentation

Detailed documentation for the CLI can be found at aurum-cli.md

Neo4J Export

A refactored Neo4J export module that is cleaner, faster, simpler and more complete (exports all Relation types, not only CONTENT_SIM). The export process is also monitored through a tqdm-powered progress bar. This could also serve as a template for other backends too.

API Refactors

Adds another kwd as_str=False argument knowledgerepr/fieldnetwork.py:enumerate_relation(self, relation, as_str=True) to allow method clients to get tuples of Hit pairs. Currently they could only get concatenated str and had to rely on regexes to extract tuples (making the prior Neo4J module too slow).. This change is backwards compatible

Bug Fixes

Fixes a minor bug introduced by https://github.com/mitdbg/aurum-datadiscovery/pull/126 and caused when init_system(<path_to_serialized_model>, create_reporting=False). Also updates the relevant documentation in quickstart.md

raulcf commented 5 years ago

This looks fantastic. I'm directly merging. Thanks!