promised-ai / lace

A probabalistic ML tool for science
Other
107 stars 8 forks source link

Cannot insert boolean categorical data #161

Closed schmidmt closed 10 months ago

schmidmt commented 10 months ago

Please be sure to search open and closed issues for existing issues covering your bug before opening a new one

Describe the bug

When inserting a Boolean Categorical column, the following error is returned

DatumIncompatibleWithColumn { col: "bool_col", ftype_req: Categorical, ftype: Categorical }

This suggests that the required column and the given type are categorical, which should not be an error.

Is this a regression? No, I am not aware this has ever worked

To Reproduce

    #[test]
    fn append_bool() {
        let coltype = ColType::Categorical {
            k: 2,
            hyper: Some(CsdHyper::default()),
            prior: None,
            value_map: ValueMap::Bool,
        };
        let md0 = ColMetadata {
            name: "bool_col".to_string(),
            coltype: coltype.clone(),
            notes: None,
            missing_not_at_random: false,
        };

        let mut engine = Engine::new(
            1,
            Codebook::new(
                "test".to_string(),
                ColMetadataList::new(vec![]).unwrap(),
            ),
            data_source::DataSource::Empty,
            0,
            rand_xoshiro::Xoshiro256Plus::seed_from_u64(0x1234),
        )
        .unwrap();

        // Insert once with specific metadata.
        engine
            .insert_data(
                vec![(
                    "abc",
                    vec![(
                        "bool_col",
                        Datum::Categorical(Category::Bool(false)),
                    )],
                )
                    .into()],
                Some(ColMetadataList::new(vec![md0]).unwrap()),
                WriteMode::unrestricted(),
            )
            .unwrap();

        // Insert again without metadata for the bool column.
        engine
            .insert_data(
                vec![(
                    "def",
                    vec![(
                        "bool_col",
                        Datum::Categorical(Category::Bool(false)),
                    )],
                )
                    .into()],
                None,
                WriteMode::unrestricted(),
            )
            .unwrap();
    }

This panics with the following error:

called `Result::unwrap()` on an `Err` value: DatumIncompatibleWithColumn { col: "bool_col", ftype_req: Categorical, ftype: Categorical }

In pylace, this manifests as ValueError: Provided a Categorical data for '_synthetic' but '_synthetic' is Categorical.

Expected behavior

Appending to a column of boolean values should be possible.