lzamparo / embedding

Learning semantic embeddings for TF binding preferences directly from sequence
Other
0 stars 0 forks source link

Considerations from ENCODE meeting #18

Open lzamparo opened 6 years ago

lzamparo commented 6 years ago

Presented the ChIP-seq peaks vs flanks comparison on Friday ENCODE call today. Jeff and Jacob suggest that HashingVectorizer is not a good comparator for my method since I want something that takes proximity into account, like a more locality sensitive hashing, or random projection.

Jeff had a bunch of suggestions:

  1. experiment with increasing K
  2. experiment with increasing K while also changing stride to something other than 1. Says this scheme is equivalent to Kth order Markov model.
  3. experiment with adding a supervised component to objective. One candidate for supervision would be to minimize the differences of absolute distances calculated by a gappy-k-mer kernel, and Euclidean distances between embedded kmers.
  4. Jeff suggested doing a ton of experiments and seeing which actually lead to more useful representations; at this stage I should do much more exploration and less exploitation.

This suggests another possible measure of validation:

lzamparo commented 6 years ago

Did the random projection experiment comparison: unless I'm not treating the counts -> projetion properly, my embedding features are working generally better: five_factors_peaks_vs_flanks_metrics