How to encode the text information into the semantic field

yumingj / Talk-to-Edit

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

https://www.mmlab-ntu.com/project/talkedit/

338 stars 47 forks source link

How to encode the text information into the semantic field #4

Closed lsm123123 closed 3 years ago

lsm123123 commented 3 years ago

I revise the code and i am wondering whether to encode the text-guided feature into the semantic field so that it can be modified to a cross-modal semantic field conditioned on the text.

yumingj commented 3 years ago

Hi, thanks for your interest in our work!

In our implementation, we do not encode the text information into the semantic field directly. However, I think it is possible to encode the text information into the semantic field. Maybe you can use LSTM layers (or some other text feature extractor) to extract the text features first. And then concat the text-guided feature with the input latent code, and use the concated feature as the input.

lsm123123 commented 3 years ago

Hi, thanks for your interest in our work!

In our implementation, we do not encode the text information into the semantic field directly. However, I think it is possible to encode the text information into the semantic field. Maybe you can use LSTM layers (or some other text feature extractor) to extract the text features first. And then concat the text-guided feature with the input latent code, and use the concated feature as the input.

Thank you for your reply! I will build a baseline based on this nice work and try some novel ideas. Furthermore, i want to contact you with WeChat for further communications. My research interest is also text-driven image generation and editing.

lsm123123 commented 3 years ago

Hi, thanks for your interest in our work!

In our implementation, we do not encode the text information into the semantic field directly. However, I think it is possible to encode the text information into the semantic field. Maybe you can use LSTM layers (or some other text feature extractor) to extract the text features first. And then concat the text-guided feature with the input latent code, and use the concated feature as the input.

Is it convenient for you to leave your wechat for further communication?

yumingj commented 3 years ago

Hi, you can leave your wechat id here~

lsm123123 commented 3 years ago

Hi, you can leave your wechat id here~

OK, my wechat id is xhjx123123. I'd like to exchange academic content with you!!