tony-lind / SyntheticDataGeneration

0 stars 0 forks source link

A few questions about the paper and the code #1

Open fengxy369 opened 1 year ago

fengxy369 commented 1 year ago

After reading your paper and code, I have the following questions: (1) I don't understand the purpose of using TapNet that you proposed in your paper. Dimensionality reduction? Or classification? Also, I didn't find TapNet in your code. Where does it come into play? (2) What is contained in the file embeddings.txt? Is it raw data? Or is it processed data? (3) You compared the performance of the various methods in the table, but what classifier did you use? I would be very grateful if you could answer my questions

tony-lind commented 1 year ago

Hi fengxy,

1) Exactly, I utilize TapNet as a way to reduce dimensionality or you can think of it as an distiller, refining relevant patterns into a new vector/embedding. TapNets base code you can find here https://github.com/xuczhang/tapnet. So basically I use TapNet with standard settings, with our proprietary data which has 6000 trucks in it out which 5759 belongs to the negative class and 222 to the positive class. The feature vector is quite large (~1300 feature) and we have at least 15 snapshots. So for each truck the observations are of size (1300 * 15). This is by TapNet reduced to 300 features. 2) In the file embeddings.txt you have this data produced by TapNet. Note that the two first examples in this file correspond to TapNet prototype classes. Then all the 6000 trucks follow. The first column refers to to the (y) class (0 or 1), then comes the embedding (x). So it is data processd by TapNet. 3) I used Python library scikit-learn and their random forest (RF) implementation with standard setting. I was interested in what the impact of different synthetic data generation methods had on the final prediction model. Hence this is the only variable I changes in my experiment as well as the number of syntheticexamples created.

I hope this answers your questions.

Best regards,

Tony Lindgren

Sent with Proton Mail secure email.

------- Original Message ------- On Friday, February 17th, 2023 at 08:05, fengxy369 @.***> wrote:

After reading your paper and code, I have the following questions: (1) I don't understand the purpose of using TapNet that you proposed in your paper. Dimensionality reduction? Or classification? Also, I didn't find TapNet in your code. Where does it come into play? (2) What is contained in the file embeddings.txt? Is it raw data? Or is it processed data? (3) You compared the performance of the various methods in the table, but what classifier did you use? I would be very grateful if you could answer my questions

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>