vanderschaarlab / synthcity

A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
https://www.vanderschaar-lab.com/
Apache License 2.0
414 stars 54 forks source link

Benchmark problem on survivalgan #261

Open BioinfoLabImmuno opened 5 months ago

BioinfoLabImmuno commented 5 months ago

What?

I want to evaluate the synthetic data generated but I have this code:

from synthcity.benchmark import Benchmarks

df=df.loc[df['OS_time']>0,:]
loader = SurvivalAnalysisDataLoader(
    df,
    target_column="os_event",
    time_to_event_column="OS_time",
)
score = Benchmarks.evaluate(
    [(f"test_{model}", model, {}) for model in ["survival_gan"]],
    loader,
    synthetic_size=100,
    repeats=2,
    task_type="survival_analysis",
)

Here I have the error

)     time_horizons = np.linspace(T.min(), T.max(), num=5)[1:-1].tolist()

ValueError: The time_to_event_column contains 1 values less than or equal to zero. Please remove them.

I have check the loader value of time_to_event is a numeric no 0 are present. Could you please help me?

robsdavis commented 4 months ago

Hi @BioinfoLabImmuno,

This error occurs if applying a filter like this data[data[time_to_event_column] > 0] changes the value of the first dimension of the shape of the data. Can you see why your data would cause this to happen? for example, the time to event value cannot be negative