typedb / typedb-driver

TypeDB Drivers for Rust, Python, Java, Node.js, C, C++, and C#.
https://typedb.com
Apache License 2.0
38 stars 33 forks source link

Easier Value Retrieval #474

Closed suciokhan closed 1 year ago

suciokhan commented 2 years ago

Problem to Solve

Currently retrieving the values from attributes queried for is quite cumbersome. For example, if I want to get a series of entities and the values of their attributes that meet specific criteria, I have to run a separate .get(var) for each variable I write in the query. If there were a simple way to return the results in a dictionary/list-of-dictionaries type of format containing all the key:value items queried for it would greatly enhance productivity. As it stands, I spend most of my time figuring out how to transform the data into a usable format rather than exploiting the data from these queries.

Current Workaround

Make a separate list object for every variable I want information on; which requires different lists for each query depending on how may variables I'm querying for. I then make an empty list, and append a dictionary that contains the values for a single result by iterating over a range the length of one of the lists, using the indices for value retrieval and have to manually create the key names.

For example:

items = []
for i in range(0,len(wid)):
    items.append({
        'researcher_id':aid[i],
        'researcher_name':aname[i],
        'work_id':wid[i],
        'work_title':wtitle[i],
        'cited_by_count':cbc[i],
        'institution_name':iname[i],
        'institution_city':icity[i]
    })

Proposed Solution

Make a function that easily lets you squirt the results into this kind of a format without requiring the user to manually create lists/dictionaries, etc. for holding the values and then having to transform them.

Something like:

with session.transaction(Transactiontype.READ) as read_transaction:
    results_json = read_transaction.query().match(query).get_keyvalues()

results_json = [{researcher_id: 'abcd', 'researcher_name': "Bob", 'work_id':'12324', ... }, {...}]

Additional Information

NA

alexjpwalker commented 2 years ago

@suciokhan : It sounds like you're looking for concept_map.map()


results_maps = [answer.map() for answer in read_transaction.query().match(query)]
suciokhan commented 2 years ago

@alexjpwalker Hmmm...that's close; a nice structure, but it doesn't have the values...

with TypeDB.core_client("127.0.0.1:1729") as client:
    with client.session("database", SessionType.DATA) as session:
        with session.transaction(TransactionType.READ, options) as read_transaction:
            # Do a query
            results_maps = [answer.map() for answer in read_transaction.query().match(query)]

gives me:

[{'a': <typedb.concept.thing.entity._Entity at 0x7f04945611f0>,
 'wtitle': <typedb.concept.thing.attribute._StringAttribute at 0x7f04ac6bc550>,
 'iname': <typedb.concept.thing.attribute._StringAttribute at 0x7f04ac6bc610>,
 'wid': <typedb.concept.thing.attribute._StringAttribute at 0x7f04ac6bc760>,
 'cid': <typedb.concept.thing.attribute._StringAttribute at 0x7f04ac6bc8b0>,
 'icity': <typedb.concept.thing.attribute._StringAttribute at 0x7f04ac6bca00>,
 'c': <typedb.concept.thing.entity._Entity at 0x7f04ac6bcb50>,
 'inst': <typedb.concept.thing.entity._Entity at 0x7f04ac6bcca0>,
 'aid': <typedb.concept.thing.attribute._StringAttribute at 0x7f04ac6bcdf0>,
 'aname': <typedb.concept.thing.attribute._StringAttribute at 0x7f04ac6bcf40>,
 'auth': <typedb.concept.thing.relation._Relation at 0x7f049456f0a0>,
 'cbc': <typedb.concept.thing.attribute._LongAttribute at 0x7f049456f1f0>,
 'w': <typedb.concept.thing.entity._Entity at 0x7f049456f340>,
 'adj': <typedb.concept.thing.relation._Relation at 0x7f049456f490>,
 'aff': <typedb.concept.thing.relation._Relation at 0x7f049456f5e0>},  ...]
alexjpwalker commented 2 years ago

Filter and project the answers further to extract attribute values.

results_maps = [{var_name: attr.get_value() for (var_name, attr) in answer.map() if attr.is_attribute()} for answer in read_transaction.query().match(query)]
suciokhan commented 2 years ago

This fails when I try it in a Jupyter Notebook, as well as in a standard .py file inside of the conda environment I've been using. Never seen this happen before.

query = [
    'match',
    '$c isa concept, has id $cid, has name "biological pest control";',
    '$w isa work, has id $wid, has title $wtitle, has pub_date > 2017-01-01;',
    '$adj ($w, $c) isa adjacent;',
    '$a isa researcher, has id $aid, has name $aname, has cited_by_count $cbc;',
    '$auth ($w, $a) isa authorship;',
    '$inst isa institution, has name $iname, has country_code "us", has city $icity;',
    '$aff ($inst, $a) isa affiliation;'
    ]

query = " ".join(query)

with TypeDB.core_client("127.0.0.1:1729") as client:
    with client.session("open_alex_0", SessionType.DATA) as session:
        with session.transaction(TransactionType.READ, options) as read_transaction:
            # Do a query
            #answer_iterator = read_transaction.query().match(query)
            #results_maps = [answer.map() for answer in read_transaction.query().match(query)]
            results_maps = [{var_name: attr.get_value() for (var_name, attr) in answer.map() if attr.is_attribute()} for answer in read_transaction.query().match(query)]

Error in Jupyter:

Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. View Jupyter [log](command:jupyter.viewOutput) for further details

Error in terminal:

Traceback (most recent call last):
E1014 08:21:07.116613373   18428 metadata.cc:253]            WARNING: 1 metadata elements were leaked
E1014 08:21:07.116654999   18428 metadata.cc:260]            mdelem ':authority' = '127.0.0.1:1729'
Segmentation fault (core dumped)

I thought perhapts it was a VSCode issue, so I tried running the test.py version in a regular terminal, and I get the same failure. Also tried restarting TypeDB server, no impact.

alexjpwalker commented 2 years ago

Thanks for reporting this issue @suciokhan . We've seen it before sporadically in our CI machines:

It may be an issue in gRPC Python, or our usage of it - we're not quite sure at the moment.

alexjpwalker commented 1 year ago

@suciokhan The issue where the client logs "WARNING: 1 metadata elements were leaked" and then segfaults should be fixed as of Client Python v2.11.2.

Could we possibly retest this and see if the code snippet

results_maps = [{var_name: attr.get_value() for (var_name, attr) in answer.map() if attr.is_attribute()} for answer in read_transaction.query().match(query)]

works fine?

Basically I see the value in this proposal, however if it is accomplishable with a straightforward one-liner (as many things are in Python!) then we'd be wary of adding new methods to the client that would result in increasing the complexity of the ConceptMap source code.

james-whiteside commented 1 year ago

Hi @suciokhan, not sure if you are still working on this, but you might be interested to know we've recently released a Jupyter connection for TypeDB. It includes JSON-style output for all queries so you should find it easier to address the relevant data now. If you do try it out, please let me know how you get on.

flyingsilverfin commented 1 year ago

This will be addressed by https://github.com/vaticle/typedb/pull/6888

flyingsilverfin commented 1 year ago

We've implemented the 'Fetch' query in https://github.com/vaticle/typedb/pull/6888 so we will close this issue. Expect it to be released and usable in 2.25.x TypeDB Core and Enterprise!