Questions about the data

m-Just / OoD-Bench

MIT License

42 stars 3 forks source link

Questions about the data #7

Open renxwang opened 9 months ago

renxwang commented 9 months ago

Hello, very nice and inspiring paper. I would like to try your proposed two metrics on my own dataset and well-trained model.

However, I have a problem about this codes in the file "quantify.py": data = np.load(Path(args.feature_dir, 'data.npz')) y_p, z_p, y_q, z_q = data['y_p'], data['z_p'], data['y_q'], data['z_q']

How should I calculate y_p, z_p and y_q, z_q based on my own dataset? Thank you very much.

m-Just commented 9 months ago

Hi, thank you for your interest in our work!

To quantify the shifts, you must train an environment classifier first, and then use the classifier to extract the requested features. These codes are provided in the repository (see train.py and extract.py in the same folder where quantify.py is located). There is also main.py that handles all the procedures for you. Please see README on how to launch the main script.

renxwang commented 9 months ago

Thanks for your reply. Just a simple question, if I don't have the ground truth in my test data because it is in vivo, could I still quantify the domain shift of the test dataset using your method? Thank you very much!

m-Just commented 9 months ago

Without labels, only diversity shift can be quantified. You need labels for both training and test data to quantify correlation shift.

If you have reasons to believe that your model is accurate on the test data, maybe you can generate some pseudo labels for them.

renxwang commented 9 months ago

Thank you very much and I will try.