This point on domain adaptation doesn't make sense to me

jisraeli commented 7 years ago

Without cell-specific features, another solution could be to use domain adaptation methods where the model trains on a source cell type and uses unsupervised feature extraction methods to predict on a target cell type in the Transcription factors and RNA-binding proteins subsection doesn't make sense to me. Non cell-specific features, absent cell-specific information, have the same distribution in source and target celltypes. Domain adaptation is used to normalize for differences between the source and target distribution, so I'd argue the opposite - that cell-specific features with different source and target distributions must be used to take advantage of this technique. Am I misunderstanding this point?

agitter commented 7 years ago

@jisraeli I'm traveling and will look into the intentions of this sentence later this week.

agitter commented 7 years ago

@jacklanchantin I'm looking back at our discussion in #356 to recall the context of the quoted sentence above. I believe this sentence was written in regards to the comment

You can briefly mention the additional complication that domain adaptation methods will also be imperative to account for the fact that the same TF can have very different cofactors across conditions/cell types. Hence, sequence features that optimal in one condition may result in very poor generalization in a new condition. This area of research will be challenging but exciting.

Is that correct? Let's clarify this domain adaptation sentence.

jacklanchantin commented 7 years ago

@agitter sorry, I must've missed this. I think there are 2 main aspects of the domain adaptation area (corresponding to 2 different problems)

In the case of cross species, since the sequences are different, we can use domain adaptation methods such as the methods used to predict clothing reviews based on a training set of jewelry reviews. Similar task, but the context is different. In this case, with regards to Anshul's comment, hopefully the unsupervised methods will be able to pick up the fundamental differences between domains which are useful for prediction.
In the case off cross cell lines, the sequences are almost identical, so we need more advance techniques of domain adaptation (I don't think anyone really knows exactly what yet), or using cell-type specific inputs such as histone modifications in addition to the sequences.

So, I think what is written is correct, but it doesn't exactly address Anshul's comment. Should I add a clarification of that?

akundaje commented 7 years ago

I think the best way to clarify this is to use a well-defined example (like the one you mentioned above) so that the statement is understood in the appropriate context.

Anshul.

On Sun, Jun 4, 2017 at 8:01 PM, Jack Lanchantin notifications@github.com wrote:

@agitter https://github.com/agitter sorry, I must've missed this. I think there are 2 main aspects of the domain adaptation area (corresponding to 2 different problems)

1.

In the case of cross species, since the sequences are different, we can use domain adaptation methods such as the methods used to predict clothing reviews based on a training set of jewelry reviews. Similar task, but the context is different. In this case, with regards to Anshul's comment, hopefully the unsupervised methods will be able to pick up the fundamental differences between domains which are useful for prediction. 2.

In the case off cross cell lines, the sequences are almost identical, so we need more advance techniques of domain adaptation (I don't think anyone really knows exactly what yet), or using cell-type specific inputs such as histone modifications in addition to the sequences.

So, I think what is written is correct, but it doesn't exactly address Anshul's comment. Should I add a clarification of that?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/523#issuecomment-306092391, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7ESmajzst6jPYPTebu4qowGcexcNuks5sA2-hgaJpZM4NodKy .

jacklanchantin commented 7 years ago

@akundaje do you think we should mention both cross cell line and cross species? Or just cross cell line?

akundaje commented 7 years ago

Any one of them should be fine.

-Anshul.

On Jun 6, 2017 10:20 AM, "Jack Lanchantin" notifications@github.com wrote:

@akundaje https://github.com/akundaje do you think we should mention both cross cell line and cross species? Or just cross cell line?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/523#issuecomment-306556833, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EXFhVdOasT4lAqGLQYdTxueHJnxoks5sBYp2gaJpZM4NodKy .

jacklanchantin commented 7 years ago

@agitter do you want me to make a pull request? The follow is what I would change that sentence to:

Without cell-specific features, we need models which can handle the differences between the domains. For example, if we have the ChIP-seq data for a particular TF from a mouse, but we want to predict the binding locations for that same TF on a human, we need to transfer the knowledge of training on the mouse to the human. This has been done in many other areas such as sentiment analysis [@doi:10.1.1.231.3442]. For example, we can train on the reviews for books, and predict on the reviews for movies. These are similar tasks, but the context (i.e. books or movies) is different. Our models should be able to find the fundamental properties of each domain so that the model can be used between contexts.

agitter commented 7 years ago

@jacklanchantin Yes, please make a new pull request. We can continue this discussion there about the exact phrasing. I support the suggestions above to add an example here to clarify exactly what we mean.

greenelab / deep-review

This point on domain adaptation doesn't make sense to me #523