Example for custom non-image dataset (classification or regression)

davidshumway commented 1 year ago

Similar to this question about adding a custom dataset: (https://github.com/thuml/Transfer-Learning-Library/issues/188).

Are there any examples showing use of this library with non-image data, or could help by showing how to setup a simple non-image example? For example, I'd like to use the library in the SDA, UDA, or SSL scenarios, with simple tabular data such as:

Labeled data Source 1 Xs1 = Feature1, Feature2, Feature3 ys1 = Feature4

Source 2 Xs2 = Feature1, Feature2, Feature3 ys2 = Feature4

Target 1 Xt1 = Feature1, Feature2, Feature3 yt1 = Feature4

Unlabeled target data Target 1 Xt1_unlabeled = Feature1, Feature2, Feature3

For example, starting from pandas, and assuming a two-class classification problem:

Xs1 = pd.DataFrame({
  'Feature1': [0, 1, 2],
  'Feature2': [10, 11, 22],
  'Feature3': [100, 111, 222],
})
ys1 = [0, 1, 0]

Perhaps using regression is also possible? For example:

ys1 = [0.01, 0.05, 0.10]

Thanks!

thucbx99 commented 1 year ago

Our library has no strict requirements for datasets (such as data format). So I think the problem here is how to define a dataset in PyTorch.

For transfer learning, I recommend the implementation of Wilds, such as https://github.com/p-lambda/wilds/blob/main/wilds/datasets/amazon_dataset.py

For tabular data, I think it will be helpful to refer to the open source code of related research. And I'm not very familiar with this field.

davidshumway commented 1 year ago

Thank you, @thucbx99! I will look at Wilds and consider related research.

mashaan14 commented 1 year ago

I implemented one method on non-image data. I used 2D points. please check it out on: https://github.com/mashaan14/ADDA-toy

thuml / Transfer-Learning-Library

Example for custom non-image dataset (classification or regression) #196