kundajelab / tfmodisco

TF MOtif Discovery from Importance SCOres
MIT License
124 stars 29 forks source link

Doesn't work for the the sequences in variable lengths. #107

Open ruizhideng opened 1 year ago

ruizhideng commented 1 year ago

Hi @AvantiShri,

Thanks for the amazing work!

In the following notebook, you mentioned the pipeline also works for the sequences in different lengths. However, when I was testing the notebook with input data with shape of [100, length, 4], the length ranges from 500-1000 bp, it raised the ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (100,) + inhomogeneous part. Then if I crop the sequences and contribution scores to the same length, it works again. I am wondering is there any version available for the sequences with different length? https://github.com/kundajelab/tfmodisco/blob/master/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA.ipynb

Thank you again!

Best wishes, Ruizhi

AvantiShri commented 1 year ago

Hi Ruizhi,

Sorry for the slow response, and thanks for using tfmodisco!

My first thought is whether the outermost iterable is a python list? That is, rather than input data of shape [100, length, 4], you want to provide a python list of length 100 with entries that are arrays, where each array has dimensions (length, 4).

As an aside, you may want to use tfmodisco-lite (mentioned in the readme) as that is being actively maintained.

On Fri, 31 Mar, 2023, 15:56 ruizhideng, @.***> wrote:

Hi @AvantiShri https://github.com/AvantiShri,

Thanks for the amazing work!

In the following notebook, you mentioned the pipeline also works for the sequences in different lengths. However, when I was testing the notebook with input data with shape of [100, length, 4], the length ranges from 500-1000 bp, it raised the ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (100,) + inhomogeneous part. Then if I crop the sequences and contribution scores to the same length, it works again. I am wondering is there any version available for the sequences with different length?

https://github.com/kundajelab/tfmodisco/blob/master/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA.ipynb

Thank you again!

Best wishes, Ruizhi

— Reply to this email directly, view it on GitHub https://github.com/kundajelab/tfmodisco/issues/107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARSFBR2CQFY6FZOICCTNATW62EZZANCNFSM6AAAAAAWOJTBBI . You are receiving this because you were mentioned.Message ID: @.***>

ruizhideng commented 1 year ago

Thanks for the reply.

Yes, the outermost iterable is a python list. I am wondering whether you still have the pipeline for that version.

I also tried lite version, it only works for arraries in the same length.

But it's fine. If it's too tricky, I will just pad the input data.