laminlabs / lamindb

A data framework for biology.
https://docs.lamin.ai
Apache License 2.0
126 stars 10 forks source link

Curator validated logging not working? #2099

Open Zethson opened 1 day ago

Zethson commented 1 day ago

Report

!lamin init --storage ./run-tests --name run-tests --schema bionty

import lamindb as ln
import bionty as bt
import pandas as pd

mondo_diseases = [
    "Alzheimer's disease", "Parkinson's disease", "Breast cancer", "Colorectal cancer", "Asthma", 
    "Diabetes mellitus type 2", "Hypertension", "Multiple sclerosis", "Osteoporosis", "Cystic fibrosis", 
    "Huntington's disease", "Leukemia", "Lung cancer", "Prostate cancer", "Rheumatoid arthritis", 
    "Sickle cell anemia", "Thalassemia", "Schizophrenia", "Bipolar disorder", "Liver cirrhosis",
    "Chronic obstructive pulmonary disease", "Pancreatic cancer", "Psoriasis", "Amyotrophic lateral sclerosis", "Epilepsy"
]

fake_diseases = [f"FakeDisease_{i}" for i in range(1, 26)]

df = pd.DataFrame({
    "Disease": mondo_diseases + fake_diseases,
})

curator = ln.Curator.from_df(df, categoricals={"Disease": bt.Disease.name}, verbosity="hint")
curator.validate()

I expected this to also print the successfully validated terms.

Version information

No response

Zethson commented 1 day ago

In such cases

  # inspect from public (bionty only)
    if hasattr(registry, "public"):
        verbosity = settings.verbosity
        try:
            settings.verbosity = "error"
            public_records = registry.from_values(
                non_validated,
                field=field,
                **kwargs_current,
            )
            values_validated += [getattr(r, field.field.name) for r in public_records]
        finally:
            settings.verbosity = verbosity

    validated_hint_print = validated_hint_print or f".add_validated_from('{key}')"
    n_validated = len(values_validated)
    if n_validated > 0:
        _log_mapping_info()
        logger.warning(
            f"found {colors.yellow(n_validated)} validated terms: "
            f"{colors.yellow(', '.join(values_validated[:10]) + ', ...' if len(values_validated) > 10 else ', '.join(values_validated))}\n      → save terms via "
            f"{colors.yellow(validated_hint_print)}"
        )

n_validated is apparently always 0. Hmm