vaticle / typedb-ml

TypeDB-ML is the Machine Learning integrations library for TypeDB
https://vaticle.com
Apache License 2.0
552 stars 93 forks source link

Example Diagnosis does not work properly. #168

Closed kaad01 closed 1 year ago

kaad01 commented 1 year ago

Description

I can not run the example diagnosis without any problems. There had to be a lot of fixing on my part. Maybe it is an environment problem, but there are still some things you should consider looking over. I will go in steps and show you how I fixed them. I can not go further then step 3. I need your help there.

1. load_typeql_file()

Reproducible Steps Starting the program with: python3.9 -m examples.diagnosis.diagnosis "D:\UNI\Master\Masterarbeit\typedb-all-windows-2.14.2\typedb.bat"

Output gives me a FileNotFoundError: image

Expected Output The line load_typeql_file(typedb_binary_directory, database, schema_file_path, FileType.Schema) should load the schema of the example and do a transaction with it. The same problem comes up with the data (next line).

My Fix I just loaded the schema and data manually and commented the following lines out to continue the program: in diagnosis.py

93 create_database(client, database)
95 load_typeql_file(typedb_binary_directory, database, schema_file_path, FileType.Schema)
96 load_typeql_file(typedb_binary_directory, database, seed_data_file_path, FileType.Data)

2. generate_example_data()

This can work sometimes, but sometimes it does not. The problem comes with these lines: in generate.py

134 for example_id in range(0, num_examples):
135         tx = session.transaction(TransactionType.WRITE)
136         for query in get_example_queries(pmf, example_id):
137             tx.query().insert(query)
138         tx.commit()
139 
140 session.close()

We want to insert querys about persons and there diagnosis. But sometimes the program just inserts the persons without matching and inserting the relations. Although the written querys are right.

My Fix

for example_id in range(0, num_examples):
        for query in get_example_queries(pmf, example_id):
            tx = session.transaction(TransactionType.WRITE)
            tx.query().insert(query)
            tx.commit()

session.close()

This fixes the before mentioned problem, but is not as elegant as before because we have far more transactions and commits then before. If the program goes trough to this point, you should have a database that should look like this: image

For now I leave it at this.

3. writer.add_histogram()

If the above problem is fixed the program goes through till this happens: image

This comes up, because in line 224 in diagnosis.py the edge_store["edge_attr"] sometimes is empty:

{'edge_index': tensor([], size=(2, 0), dtype=torch.int64), 'edge_attr': tensor([], size=(0, 32)), 'y_edge': tensor([], dtype=torch.int64)}, {'edge_index': tensor([], size=(2, 0), dtype=torch.int64), 'edge_attr': tensor([], size=(0, 32)), 'y_edge': tensor([], dtype=torch.int64)}

This is the case because the query does not give a value back: image

The program does not handle this exception. Here I am stuck right now. I hope someone can help me.

Environment

  1. OS (where TypeDB server runs): Windows 11
  2. TypeDB version (and platform): TypeDB 1.14.2
  3. TypeDB, typedb-ml and client-python version: typedb-ml 0.3.0 and client-python 2.9.0 }
  4. Python version: 3.9
  5. Other environment details:
    D:\UNI\Master\Masterarbeit\typedb-ml [(master)]> pip3.9 freeze
    absl-py==1.2.0
    cachetools==5.2.0
    certifi==2022.6.15
    charset-normalizer==2.1.0
    colorama==0.4.6
    decorator==5.1.1
    google-auth==2.9.1
    google-auth-oauthlib==0.4.6
    grpcio==1.43.0
    idna==3.3
    importlib-metadata==4.12.0
    Jinja2==3.1.2
    joblib==1.1.0
    Markdown==3.4.1
    MarkupSafe==2.1.1
    networkx==2.5
    numpy==1.21.6
    oauthlib==3.2.0
    pandas==1.3.5
    Pillow==9.4.0
    protobuf==3.15.5
    pyasn1==0.4.8
    pyasn1-modules==0.2.8
    pyparsing==3.0.9
    python-dateutil==2.8.2
    pytz==2022.1
    requests==2.28.1
    requests-oauthlib==1.3.1
    rsa==4.9
    scikit-learn==1.0.2
    scipy==1.7.3
    six==1.16.0
    tensorboard==2.9.1
    tensorboard-data-server==0.6.1
    tensorboard-plugin-wit==1.8.1
    threadpoolctl==3.1.0
    torch==1.11.0+cpu
    torch-geometric==2.0.4
    torch-scatter==2.0.9
    torch-sparse==0.6.14
    torch-tb-profiler==0.4.1
    torchaudio==0.13.1
    torchvision==0.14.1
    tqdm==4.64.0
    typedb-client==2.9.0
    typedb-protocol==2.9.0
    typing_extensions==4.3.0
    urllib3==1.26.10
    Werkzeug==2.1.2
    zipp==3.8.1
kaad01 commented 1 year ago

@jmsfltchr Could you assign someone to help?

haikalpribadi commented 1 year ago

@kaad01 James is no longer with the team. @flyingsilverfin can you help?

kaad01 commented 1 year ago

@kaad01 James is no longer with the team. @flyingsilverfin can you help?

Thank you for your answer. I fixed it. There was a case where not every relation is in the database. So the writer gets an empty tensor you should just check if the tensor is empty with .numel()

for edge_type, edge_store in zip(data.edge_types, data.edge_stores):
        if edge_store["edge_attr"].numel():
            writer.add_histogram('('+', '.join(edge_type) + ')/edge_attr', edge_store["edge_attr"])
            writer.add_histogram('('+', '.join(edge_type) + ')/y_edge', edge_store["y_edge"])

for node_type, node_store in zip(data.node_types, data.node_stores):
        if node_store["x"].numel():
            writer.add_histogram(node_type + '/x', node_store["x"])