Image Data - Githubissues

I hope this message finds you well. I am currently working on a project where I would like to adapt the "Decision Transformer" model, originally designed for text and sequences, to work with image data. Given your expertise in machine learning and deep learning, I was hoping to seek your guidance on how to approach this adaptation effectively.

Specifically, I would appreciate your insights on the following:

Image Preprocessing: What are the key image preprocessing steps I should consider before feeding the data into the model? Are there specific normalization or augmentation techniques that work well with the "Decision Transformer"?

Architecture Modification: How should I adjust the "Decision Transformer" architecture to accommodate image embeddings? Are there any attention mechanisms or layers that need special attention when handling image data?

Output Layer Configuration: Depending on the image task (e.g., classification, object detection), what changes should I make to the output layer of the model to align with the number of classes or categories in my image dataset?

Training Strategies: Are there any particular training strategies or fine-tuning techniques I should be aware of when adapting the model for image data?

Best Practices: Are there best practices or resources you would recommend for adapting transformer-based models to work with images?

I am eager to learn and make the most of this adaptation, and your guidance would be immensely valuable in this process. If you have any available time for a brief discussion or if you can point me to relevant resources, I would greatly appreciate it.

Thank you for considering my request, and I look forward to hearing from you at your earliest convenience.

Note: Data I have is website data and its not an Offline RL dataset.

kzl / decision-transformer

Image Data #65