Extending the work to approaches not utilising pre-trained feature extractors

Hi, Congratulations on the work. It seems really intriguing. I came across a line in the paper:

However, the reader should consider that our fusion approach is in fact not limited to neural networks as primary feature extractors.

I was wondering if you could elaborate on this a little bit.

I was hoping to use a similar approach as mentioned in the paper but I don't want to restrict the search to pre-trained detectors. If I want to search for pre-fusion and post-fusion layers as well, do you think the current framework can handle that? And what would be a good starting point?

jperezrua / mfas

Extending the work to approaches not utilising pre-trained feature extractors #16