Code of the movie dataset

marc-p09 commented 9 months ago

Can you provide more details on how to obtain and pre-process the datasets to run the code? I am also interested in the Movie dataset, will you upload the code and hyperparameters of this dataset?

I see that you have implemented INVRAT on the Movie dataset and this is exciting! I have been struggling for a long time to implement INVRAT on general datasets without multi-aspect labels, would you be willing to provide the way to partition this dataset into different environments, and also the code?

yuelinan commented 9 months ago

Sure, it's currently Saturday and I'll make our code available on a business day.

I can briefly describe what I did. Using [1]'s environment inference method, I divided the movie dataset into two environments, and then reproduced INVRAT using pytorch (sorry, I'm not very familiar with tensorflow that is used in original INVRAT). Recently, I got a new paper [2] from the review comments, which may provide some theoretical basis for this approach. But unfortunately, I do not fully understand this paper yet. In addition, another review comment found my approach to this reproduction to be controversial. Due to time constraints, I was not able to discuss it further with him. I also asked the original author of INVRAT and am currently waiting for a response. But in practice, by [1]'s method, I did divide the MOVIE dataset and achieved a good result.

Thank you

1.Learning Invariant Graph Representations for Out-of-Distribution Generalization.

2.Provably invariant learning without domain information.

yuelinan commented 9 months ago

Hi, I have made the code about interrat on the movie dataset available. You can "sh run.sh" to achieve interrat training. Besides, our reproduction of invrat is publicly available at https://github.com/yuelinan/Reproduction-of-invrat.

Merry Christmas Thank you

marc-p09 commented 5 months ago

I find the code of INVRAT is not executable.

By the way, I also find that the code of [1] is not publicly available. How did you obtain the code for this paper? Was it because someone on your team was a reviewer for it?

yuelinan commented 5 months ago

You can point out specific problems when you run our code that replicates INVRAT.

Moreover, the paper [1] has no usable code, but its algorithmic flow is clear, and you can reproduce it by carefully reading section 3.2 in [1], especially Eq. 5. If you don't know about the k-means algorithm, you can call the algorithm in sklearn for details.

marc-p09 commented 5 months ago

The code cannot correctly read the dataset files. I would appreciate it if you could provide your pre-processed data.

I have read paper [1], but I contend it cannot be used to partition environments. This is because its ability to partition environments relies on distinguishing between invariant and variant subgraphs. Should it fail to differentiate these subgraphs, then it becomes incapable of partitioning environments. However, if it can distinguish them, there is no need to redundantly use it to partition environments and then employ another method to extract the invariant rationales. Could you kindly elucidate why it is capable of partitioning environments and how it functions effectively in your paper?

Paper [1] was published in November 2022. However, I recently discovered that your paper was submitted to ICLR in September 2022. How were you able to use [1] before it was published?

yuelinan commented 5 months ago

For the dataset, you can get it from https://github.com/crazyofapple/AT-BMC/tree/main/datasets/movie_reviews_with_some_rats_adv. Thanks to @crazyofapple for sharing it!

Your question is very important, I think the method we use to divide the environment is just an expedient one and the resulting environment is not optimal. But compared to randomized division, the reproduction method we devised is effective.

Our paper was submitted to NeurIPS2022 before it was submitted to ICLR2023. At that time, we only conducted experiments on the Beer dataset. During rebuttal, the reviewer thought we needed to add the eraser dataset. I had similar doubts as yours at that time. We were suggested that we could use clustering to classify the environments, and we have therefore continued this practice. When I was answering your question, my research direction had shifted from text rationale to graph rationale. We read the paper of [1] at that time and found it very suitable as a basis for environment partitioning, so I shared it with you.

marc-p09 commented 5 months ago

Your sophistry is truly absurd. When I asked you how you were able to partition environments on an impossible dataset, you claimed that you used the method from paper [1] and merely cited some literature to confuse me. There are several dubious points here: you didn't specify how you did it, just that you used the method from paper [1]; you didn't explain why this was feasible, instead you used some literature you don't fully understand to claim feasibility, without analyzing specifically what in the literature supported your claim. When I argued that paper [1] was theoretically unfeasible, you then suggested that it might indeed not work, but was still slightly better than random partitioning. But why is it even slightly better?

Upon finding evidence that your paper was submitted before paper [1] was published, you then claimed that you weren't using the method from [1], just something very similar, which is an utterly ridiculous argument that contradicts your previous statement. Furthermore, your paper does not even mention how you partitioned environments, including in another one of your papers available here: https://openreview.net/forum?id=uGtfk2OphU. Although you added a brief description in the revised version of https://openreview.net/forum?id=uGtfk2OphU after my queries, it was absent in the first version submitted to ICLR.

yuelinan commented 5 months ago

I have responded to all of your comments promptly and politely, and have actively open sourced the code you need. But your current comment is very rude. We can have differences of opinion, but you shouldn't say that my interpretation is ridiculous.
In my opinion, I have open sourced my code and explained the process of using it in the readme. I don't understand why you say that I didn't explain how I did it.
I didn't say my reproduction doesn't work, so please don't misinterpret me. I recognize that it is a stopgap measure. Due to the lack of artificially labeled environments, we need environmental inference methods to infer the labels of potential environments; however, environmental inference methods are subject to error, and at this stage I have not found a more appropriate method.
I would like to state that it is my conclusion (since the publication of neurips2023) to use clustering methods to classify environments [1] and to implement invrat based on this. Until then, I use my empirical clustering methods to classify environments. So when you ask a question in December 2023, am I going to use my 2022 experience to answer you? I think it is more responsible to answer you using current conclusions I have already reached.
The core of my paper was never about how to divide the environment. Dividing the environment was only used to reproduce the invrat, and had nothing to do with the main point of my own paper's contribution. You asked a reproduction question? I answered it on github, I don't see any potential problem here.
You can question my method of reproduction, but I stand by the fact that my reproduction is ok for now. You can ask the invrat authors to check my code, and if they think there is a problem with my reproduction method, I will make changes.

yuelinan / Codes-of-Inter-RAT

Code of the movie dataset #1