Open sjjadsa opened 2 months ago
After performing feature extraction, can we use a vision transformer to process those features? By asking this, I'm specifically referring to whether it's possible to apply position embedding.
After performing feature extraction, can we use a vision transformer to process those features? By asking this, I'm specifically referring to whether it's possible to apply position embedding.