liu-bioinfo-lab / EPCOT

17 stars 4 forks source link

fine-tuning #2

Closed wawpaopao closed 2 months ago

wawpaopao commented 12 months ago

hello!great job! 理论上来说在下游任务的微调阶段,只用到了预训练backbone的encoder部分,只需要序列输入得到representation,那么为什么在EPCOT_usage里面还需要dnase文件作为输入呢。。。

zzh24zzh commented 12 months ago

hello!great job! 理论上来说在下游任务的微调阶段,只用到了预训练backbone的encoder部分,只需要序列输入得到representation,那么为什么在EPCOT_usage里面还需要dnase文件作为输入呢。。。

Thank you, Aowen. In the pre-training stage, the inputs to the encode contain both reference sequence and DNase-seq, so we learned representations for both sequence and DNase-seq. DNase-seq or ATAC-seq is an important input to our model, which provides cell-type specific information and makes our predictions specific to that cell type.

wawpaopao commented 12 months ago

thanks! I misunderstood the encoder !

wawpaopao commented 12 months ago

I noticed that there isn't a folder dedicated to epigenomic feature prediction. I'm interested in running this downstream task to evaluate the performance of DNA pre-trained models on predicting such data. Could you recommend reference articles or sources that provide relevant datasets for this purpose? Any guidance would be greatly appreciated!