laminlabs / lamindb

A data framework for biology.
https://docs.lamin.ai
Apache License 2.0
129 stars 12 forks source link

Curator validated logging not working? #2099

Open Zethson opened 1 month ago

Zethson commented 1 month ago

Report

!lamin init --storage ./run-tests --name run-tests --schema bionty

import lamindb as ln
import bionty as bt
import pandas as pd

mondo_diseases = [
    "Alzheimer's disease", "Parkinson's disease", "Breast cancer", "Colorectal cancer", "Asthma", 
    "Diabetes mellitus type 2", "Hypertension", "Multiple sclerosis", "Osteoporosis", "Cystic fibrosis", 
    "Huntington's disease", "Leukemia", "Lung cancer", "Prostate cancer", "Rheumatoid arthritis", 
    "Sickle cell anemia", "Thalassemia", "Schizophrenia", "Bipolar disorder", "Liver cirrhosis",
    "Chronic obstructive pulmonary disease", "Pancreatic cancer", "Psoriasis", "Amyotrophic lateral sclerosis", "Epilepsy"
]

fake_diseases = [f"FakeDisease_{i}" for i in range(1, 26)]

df = pd.DataFrame({
    "Disease": mondo_diseases + fake_diseases,
})

curator = ln.Curator.from_df(df, categoricals={"Disease": bt.Disease.name}, verbosity="hint")
curator.validate()

I expected this to also print the successfully validated terms.

Version information

No response

Zethson commented 1 month ago

In such cases

  # inspect from public (bionty only)
    if hasattr(registry, "public"):
        verbosity = settings.verbosity
        try:
            settings.verbosity = "error"
            public_records = registry.from_values(
                non_validated,
                field=field,
                **kwargs_current,
            )
            values_validated += [getattr(r, field.field.name) for r in public_records]
        finally:
            settings.verbosity = verbosity

    validated_hint_print = validated_hint_print or f".add_validated_from('{key}')"
    n_validated = len(values_validated)
    if n_validated > 0:
        _log_mapping_info()
        logger.warning(
            f"found {colors.yellow(n_validated)} validated terms: "
            f"{colors.yellow(', '.join(values_validated[:10]) + ', ...' if len(values_validated) > 10 else ', '.join(values_validated))}\n      → save terms via "
            f"{colors.yellow(validated_hint_print)}"
        )

n_validated is apparently always 0. Hmm

sunnyosun commented 2 weeks ago

Could you investigate why n_validated is 0? Is it because the default source isn't mondo?